Multiplication/accumulation operators having multiple operation circuits

ABSTRACT

A multiplication and accumulation (multiplication/accumulation) (MAC) operator includes a plurality of multiple operation circuits. The plural sets of first input data are transmitted to the plurality of multiple operation circuits, respectively. The plural sets of second input data are transmitted to the plurality of multiple operation circuits, respectively. The plural sets of operation result data are output from the plurality of multiple operation circuits, respectively. Each of the plurality of multiple operation circuits is configured to perform an arithmetic operation in a first operation mode, a second operation mode, or a third operation mode according to first to third selection signals.

CROSS-REFERENCE TO RELATED APPLICATIONS

This is a continuation application of U.S. patent application Ser. No.17/399,844, filed on Aug. 11, 2021, which claims the priority of KoreanApplication No. 10-2021-0052016, filed on Apr. 21, 2021, which areincorporated herein by reference in their entirety.

BACKGROUND 1. Technical Field

Various embodiments of the present teachings relate tomultiplication/accumulation (MAC) operators having multiple operationcircuits.

2. Related Art

Recently, interest in artificial intelligence (AI) has been increasingnot only in the information technology industry but also in thefinancial and medical industries. Accordingly, in various fields, theartificial intelligence, more precisely, the introduction of deeplearning is considered and prototyped. In general, techniques foreffectively learning deep neural networks (DNNs) or deep networks havingthe increased layers as compared with general neural networks to utilizethe deep neural networks (DNNs) or the deep networks in patternrecognition or inference are commonly referred to as the deep learning.

One of backgrounds or causes of this widespread interest may be due tothe improved performance of a processor performing arithmeticoperations. To improve the performance of the artificial intelligence,it may be necessary to increase the number of layers constituting aneural network in the artificial intelligence to educate the artificialintelligence. This trend has continued in recent years, which has led toan exponential increase in the amount of computation required for thehardware that actually does the computation. Moreover, if the artificialintelligence employs a general hardware system including a memory and aprocessor which are separated from each other, the performance of theartificial intelligence may be degraded due to limitation of the amountof data communication between the memory and the processor. In order tosolve this problem, a PIM device in which a processor and a memory areintegrated in one semiconductor chip has been used as a neural networkcomputing device. Because the PIM device directly performs arithmeticoperations in the PIM device, a data processing speed in the neuralnetwork may be improved.

SUMMARY

According to an embodiment, a multiplication and accumulation(multiplication/accumulation) (MAC) operator includes a plurality ofmultiple operation circuits. The plural sets of first input data aretransmitted to the plurality of multiple operation circuits,respectively. The plural sets of second input data are transmitted tothe plurality of multiple operation circuits, respectively. The pluralsets of operation result data are output from the plurality of multipleoperation circuits, respectively. Each of the plurality of multipleoperation circuits is configured to perform an arithmetic operation in afirst operation mode, a second operation mode, or a third operation modeaccording to first to third selection signals.

BRIEF DESCRIPTION OF THE DRAWINGS

Certain features of the disclosed technology are illustrated by variousembodiments with reference to the attached drawings, in which:

FIG. 1 illustrates a configuration of a multiple operation circuitaccording to an embodiment of the present disclosure;

FIG. 2 illustrates an example of a configuration of a multiplierincluded in the multiple operation circuit illustrated in FIG. 1;

FIG. 3 illustrates an example of a configuration of an adder included inthe multiple operation circuit illustrated in FIG. 1;

FIG. 4 illustrates an example of a matrix-vector multiplying calculationexecuted by a multiplication/accumulation (MAC) operation in a firstoperation mode of the multiple operation circuit illustrated in FIG. 1;

FIG. 5 illustrates an example of a process of the matrix-vectormultiplying calculation illustrated in FIG. 4;

FIG. 6 illustrates a first MAC operation of the matrix-vectormultiplying calculation process illustrated in FIG. 5;

FIG. 7 illustrates a second MAC operation of the matrix-vectormultiplying calculation process illustrated in FIG. 5;

FIG. 8 illustrates an example of a matrix-scalar multiplying calculationexecuted by an element-wise (EW) multiplying calculation in a secondoperation mode of the multiple operation circuit illustrated in FIG. 1;

FIG. 9 illustrates the EW multiplying calculation illustrated in FIG. 8;

FIG. 10 illustrates an example of a matrix adding calculation executedby an element-wise (EW) adding calculation in a second operation mode ofthe multiple operation circuit illustrated in FIG. 1;

FIG. 11 illustrates the EW adding calculation illustrated in FIG. 10;

FIG. 12 illustrates an accumulating calculation executed in a thirdoperation mode of the multiple operation circuit illustrated in FIG. 1;

FIG. 13 illustrates a configuration of a multiple operation circuitaccording to another embodiment of the present disclosure;

FIG. 14 illustrates an example of a configuration of a multiplierincluded in the multiple operation circuit illustrated in FIG. 13;

FIG. 15 illustrates an example of a normalizer included in the multipleoperation circuit illustrated in FIG. 13;

FIG. 16 illustrates a MAC operator according to an embodiment of thepresent disclosure;

FIG. 17 illustrates a MAC operation performed in a first MAC operationmode of the MAC operator illustrated in FIG. 16;

FIG. 18 illustrates a MAC operation performed in a second MAC operationmode of the MAC operator illustrated in FIG. 16;

FIG. 19 illustrates a processing-in-memory (PIM) device according to anembodiment of the present disclosure;

FIG. 20 illustrates an example of a MAC operation performed by the PIMdevice illustrated in FIG. 19;

FIG. 21 illustrates a PIM device according to another embodiment of thepresent disclosure; and

FIG. 22 illustrates an example of a MAC operation performed by the PIMdevice illustrated in FIG. 21.

DETAILED DESCRIPTION

In the following description of embodiments, it will be understood thatthe terms “first” and “second” are intended to identify elements, butnot used to define a particular number or sequence of elements. Inaddition, when an element is referred to as being located “on,” “over,”“above,” “under,” or “beneath” another element, it is intended to meanrelative positional relationship, but not used to limit certain casesfor which the element directly contacts the other element, or at leastone intervening element is present between the two elements.Accordingly, the terms such as “on,” “over,” “above,” “under,”“beneath,” “below,” and the like that are used herein are for thepurpose of describing particular embodiments only and are not intendedto limit the scope of the present disclosure. Further, when an elementis referred to as being “connected” or “coupled” to another element, theelement may be electrically or mechanically connected or coupled to theother element directly, or may be electrically or mechanically connectedor coupled to the other element indirectly with one or more additionalelements between the two elements. Moreover, when a parameter isreferred to as being “predetermined,” it may be intended to mean that avalue of the parameter is determined in advance of when the parameter isused in a process or an algorithm. The value of the parameter may be setwhen the process or the algorithm starts or may be set during a periodin which the process or the algorithm is executed. A logic “high” leveland a logic “low” level may be used to describe logic levels of electricsignals. A signal having a logic “high” level may be distinguished froma signal having a logic “low” level. For example, when a signal having afirst voltage corresponds to a signal having a logic “high” level, asignal having a second voltage may correspond to a signal having a logic“low” level. In an embodiment, the logic “high” level may be set as avoltage level which is higher than a voltage level of the logic “low”level. Meanwhile, logic levels of signals may be set to be different oropposite according to embodiment. For example, a certain signal having alogic “high” level in one embodiment may be set to have a logic “low”level in another embodiment.

Various embodiments of the present disclosure will be describedhereinafter in detail with reference to the accompanying drawings.However, the embodiments described herein are for illustrative purposesonly and are not intended to limit the scope of the present disclosure.

Various embodiments are directed to MAC operators including the multipleoperation circuits.

FIG. 1 illustrates a configuration of a multiple operation circuit 100according to an embodiment of the present disclosure. Referring to FIG.1, the multiple operation circuit 100 may receive first input dataA[15:0] and second input data B[15:0]. Hereinafter, it may be assumedthat the first input data A[15:0] correspond to a 16-bit binary streamand the second input data B[15:0] also correspond to a 16-bit binarystream. However, the embodiment that the first input data A[15:0] are a16-bit binary stream and the second input data B[15:0] are a 16-bitbinary stream is merely an example of the present disclosure.Accordingly, in some other embodiments, the number of bits included inthe first input data or the second input data may be less than orgreater than sixteen. In an embodiment, the first input data A[15:0] maybe transmitted from a first memory bank to the multiple operationcircuit 100, and the second input data B[15:0] may be transmitted from asecond memory bank to the multiple operation circuit 100. Alternatively,the first input data A[15:0] may be transmitted from a memory bank tothe multiple operation circuit 100, and the second input data B[15:0]may be transmitted from a buffer memory to the multiple operationcircuit 100. The multiple operation circuit 100 may receive first resultdata IY-1 from another multiple operation circuit (not shown). Themultiple operation circuit 100 may receive first to third selectionsignals SS1˜SS3 and an update signal UPDATE as control signals. Themultiple operation circuit 100 may output a second result data IY[15:0]and operation result data Y[15:0]. As used herein, the tilde “˜”indicates a range of components. For example, “SS1˜SS3” indicates thefirst to third selection signals SS1, SS2, and SS3 shown in FIG. 1.

The multiple operation circuit 100 may perform various arithmeticoperations in a plurality of operation modes. The plurality of operationmodes may include a first operation mode in which a MAC operation isperformed, a second operation mode in which element-wise (EW) operationsare performed, and a third operation mode in which an accumulatingcalculation (also, referred to as ‘accumulative adding calculation’) isperformed. The EW operations performed in the second operation mode mayinclude an EW multiplying calculation and an EW adding calculation. Inan embodiment, the MAC operation may be performed by a matrix-vectormultiplying calculation of first matrix data and second matrix data. TheEW multiplying calculation may be executed by multiplying the firstmatrix data by a matrix scalar having a constant value. The EW addingcalculation may be executed by an element-to-element adding calculationof the first matrix data and the second matrix data. In addition, theaccumulating calculation may be executed by an adding calculation of thefirst result data IY-1 and the second result data IY[15:0]. In anembodiment, the first, second, third, and fourth selectors 121 to 124may change the signal transmission paths of the first result data IY-1,the first and second input data A[15:0] and B[15:0], the multiplicationresult data AB[15:0], the addition result data DA11[15:0], and MAC dataMAC[15:0] based on the first to third selection signals SS1 to SS3according to the first, second, or third operation modes. For example,the signal transmission paths for the first result data IY-1 ormultiplication result data AB[15:0] may be changed or decided ondepending on which signal is chosen to be output from the outputterminal OUT1 of the first selector 121 based on the first selectionsignal SS1].

When the multiple operation circuit 100 performs the MAC operation inthe first operation mode, the first matrix data (i.e., the first inputdata A[15:0]) and the second matrix data (i.e., the second input dataB[15:0]) may be input to the multiple operation circuit 100. When theMAC operation is performed for an ‘M×N’ first matrix and an ‘N×1’ secondmatrix, the first input data A[15:0] may correspond to elements of thefirst matrix, and the second input data B[15:0] correspond to elementsof the second matrix. The first input data A[15:0] may have afloating-point format which is comprised of a sign part, an exponentpart, and a mantissa part, and the second input data B[15:0] may alsohave a floating-point format which is comprised of a sign part, anexponent part, and a mantissa part. However, the embodiment in which thefirst and second input data A[15:0] and B[15:0] have a floating-pointformat may be merely an example of the present disclosure. Accordingly,in some other embodiments, the first and second input data A[15:0] andB[15:0] may have a fixed-point format which is comprised of an integerpart including a sign datum and a fractional part.

When the multiple operation circuit 100 performs the EW multiplyingcalculation in the second operation mode, matrix data corresponding tothe first input data A[15:0] and a constant value corresponding to thesecond input data B[15:0] may be input to the multiple operation circuit100. When the EW multiplying calculation is performed using the ‘M×N’first matrix and a constant value as input data, the first input dataA[15:0] may correspond to elements of the first matrix and the secondinput data B[15:0] may correspond to a constant value. When the multipleoperation circuit 100 performs the EW adding calculation, the firstmatrix data corresponding to the first input data A[15:0] and the secondmatrix data corresponding to the second input data B[15:0] may be inputto the multiple operation circuit 100. When the EW adding calculation isperformed using an ‘M×N’ first matrix and an ‘M×N’ second matrix asinput data, the first input data A[15:0] may correspond to elements ofthe ‘M×N’ first matrix and the second input data B[15:0] may correspondto elements of the ‘M×N’ second matrix. In such a case, an element “a”in the ‘M×N’ first matrix may be added to an element “b” which islocated at the same position in the ‘M×N’ second matrix as the element“a”.

When the multiple operation circuit 100 performs the accumulatingcalculation in the third operation mode, the multiple operation circuit100 may receive the first result data IY-1. In an embodiment, the firstresult data IY-1 may be first multiplication result data which aretransmitted from another multiple operation circuit (not shown) to themultiple operation circuit 100. The multiple operation circuit 100 mayperform the accumulating calculation of the first multiplication resultdata and second multiplication result data stored in the multipleoperation circuit 100 by a previous operation, thereby generating andoutputting the second result data IY[15:0]. The second result dataIY[15:0] output from the multiple operation circuit 100 may betransmitted to yet another multiple operation circuit (not shown) andmay be used as the first result data IY-1 of the yet another multipleoperation circuit.

The multiple operation circuit 100 may include a multiplier 110, firstto fourth selectors 121˜124, an adder 130, and a latch circuit 140.

The multiplier 110 may have a first input terminal, a second inputterminal, and an output terminal. The first input data A[15:0]corresponding to the first matrix data and the second input data B[15:0]corresponding to the second matrix data may be input to the first inputterminal and the second input terminal of the multiplier 110. Themultiplier 110 may perform a multiplying calculation of the first inputdata A[15:0] and the second input data B[15:0] to generatemultiplication result data AB[15:0]. The multiplier 110 may output themultiplication result data AB[15:0] through the output terminal thereof.The first input terminal of the multiplier 110 may be coupled to a firstinput terminal IN21 of the second selector 122. Thus, the first inputdata A[15:0] transmitted to the first input terminal of the multiplier110 may also be transmitted to the first input terminal IN21 of thesecond selector 122. The second input terminal of the multiplier 110 maybe coupled to a first input terminal IN31 of the third selector 123.Thus, the second input data B[15:0] transmitted to the second inputterminal of the multiplier 110 may also be transmitted to the firstinput terminal IN31 of the third selector 123. The output terminal ofthe multiplier 110 may be coupled to a second input terminal IN12 of thefirst selector 121. Thus, the multiplication result data AB[15:0]outputfrom the multiplier 110 through the output terminal of the multiplier110 may be transmitted to the second input terminal IN12 of the firstselector 121.

The first selector 121 may have a first input terminal IN11, the secondinput terminal IN12, a selection terminal S1, and an output terminalOUT1. The first selector 121 may receive the first result data IY-1through the first input terminal IN11. Because the second input terminalIN12 of the first selector 121 is coupled to the output terminal of themultiplier 110, the first selector 121 may receive the multiplicationresult data AB[15:0] from the multiplier through the second inputterminal IN12. The first selector 121 may also receive the firstselection signal SS1 through the selection terminal S1 thereof. Theoutput terminal OUT1 of the first selector may be coupled to both of asecond input terminal IN22 of the second selector 122 and a first inputterminal IN41 of the fourth selector 124. The first selector 121 mayoutput the first result data IY-1, which are input through the firstinput terminal IN11, through the output terminal OUT1 in response to thefirst selection signal SS1 having a first logic level. The firstselector 121 may output the multiplication result data AB[15:0], whichare input through the second input terminal IN12, through the outputterminal OUT1 in response to the first selection signal SS1 having asecond logic level. Hereinafter, it may be assumed that the first logiclevel is a logic “low” level and the second logic level is a logic“high” level. In an embodiment, the first selector 121 may be realizedusing a 2-to-1 multiplexer having two input terminals and one outputterminal.

The second selector 122 may have the first input terminal IN21, thesecond input terminal IN22, a selection terminal S2, and an outputterminal OUT2. Because the first input terminal IN21 of the secondselector 122 is coupled to the first input terminal of the multiplier110, the first input data A[15:0] may also be transmitted to the firstinput terminal IN21 of the second selector 122. The second selectionsignal SS2 may be input to the selection terminal S2 of the secondselector 122. The output terminal OUT2 of the second selector 122 may becoupled to a first input terminal of the adder 130. The second selector122 may output the first input data A[15:0], which are input through thefirst input terminal IN21, through the output terminal OUT2 in responseto the second selection signal SS2 having a logic “low” level. Thesecond selector may output the output data of the first selector 121,which are input through the second input terminal IN22, through theoutput terminal OUT2 in response to the second selection signal SS2having a logic “high” level. In an embodiment, the second selector 122may be realized using a 2-to-1 multiplexer having two input terminalsand one output terminal. In an embodiment, the output data of the secondselector 122 may be third input data received by the adder 130.

The third selector 123 may have the first input terminal IN31, a secondinput terminal IN32, a selection terminal S3, and an output terminalOUT3. Because the first input terminal IN31 of the third selector 123 iscoupled to the second input terminal of the multiplier 110, the secondinput data B[15:0] may also be transmitted to the first input terminalIN31 of the third selector 123. The second input terminal IN32 of thethird selector 123 may be coupled to an output terminal of the latchcircuit 140. Thus, the third selector 123 may receive feedback dataDF[15:0] corresponding to the operation result data Y[15:0] which areoutput from the latch circuit 140 through the output terminal of thelatch circuit 140. The second selection signal SS2 may also betransmitted to the selection terminal S3 of the third selector 123. Theoutput terminal OUT3 of the third selector 123 may be coupled to thesecond input terminal of the adder 130. The third selector may outputthe second input data B[15:0], which are input through the first inputterminal IN31, through the output terminal OUT3 in response to thesecond selection signal SS2 having a logic “low” level. The thirdselector 123 may output the feedback data DF[15:0], which are inputthrough the second input terminal IN32, through the output terminal OUT3in response to the second selection signal SS2 having a logic “high”level. In an embodiment, the third selector 123 may be realized using a2-to-1 multiplexer having two input terminals and one output terminal.In an embodiment, output data output from the output terminal OUT3 ofthe third selector 123 may be fourth input data received by the adder130.

The adder 130 may have a first input terminal, a second input terminal,and an output terminal. The first input terminal of the adder 130 may becoupled to the output terminal OUT2 of the second selector 122. Thesecond input terminal of the adder 130 may be coupled to the outputterminal OUT3 of the third selector 123. Thus, output data of the secondselector 122 may be input to the first input terminal of the adder 130,and output data of the third selector 123 may be input to the secondinput terminal of the adder 130. When the second selection signal SS2has a logic “low” level, the first input data A[15:0] and the secondinput data B[15:0] may be transmitted to respective ones of the firstinput terminal and the second input terminal of the adder 130. When thesecond selection signal SS2 has a logic “high” level, the output data ofthe first selector 121 and the feedback data DF[15:0] output from thelatch circuit 140 may be transmitted to respective ones of the firstinput terminal and the second input terminal of the adder 130. Theoutput terminal of the adder 130 may be coupled to both of a secondinput terminal IN42 of the fourth selector 124 and a first output line161. The adder 130 may perform an adding calculation of two sets ofdata, which are input through the first and second input terminals ofthe adder 130, to generate MAC data MAC[15:0]. The adder 130 maytransmit the MAC data MAC[15:0] to the second input terminal IN42 of thefourth selector 124 and may also output the MAC data MAC[15:0] as thesecond result data IY[15:0] (also, referred to as ‘interim result data’)corresponding to output data of the multiple operation circuit 100through the first output line 161.

The fourth selector 124 may have the first input terminal IN41, thesecond input terminal IN42, a selection terminal S4, and an outputterminal OUT4. The selection terminal S4 of the fourth selector 124 maybe coupled to an output terminal of an inverter 150. The third selectionsignal SS3 may be transmitted to an input terminal of the inverter 150.In the present embodiment, the inverter 150 may be employed to morereadily distinguish logic levels of the second and third selectionsignals SS2 and SS3 from each other. Thus, in some other embodiments,the multiple operation circuit 100 may be realized without the inverter150. The output terminal OUT4 of the fourth selector 124 may be coupledto an input terminal of the latch circuit 140. The fourth selector 124may output the output data of the first selector 121, which are inputthrough the first input terminal IN41, through the output terminal OUT4in response to the third selection signal SS3 having a logic “high”level. The fourth selector 124 may output the MAC data MAC[15:0], whichare input through the second input terminal IN42, through the outputterminal OUT4 in response to the third selection signal SS3 having alogic “low” level. In an embodiment, the fourth selector 124 may berealized using a 2-to-1 multiplexer having two input terminals and oneoutput terminal. In an embodiment, the output data output from theoutput terminal OUT4 of the fourth selector 124 may be fifth input datareceived by the latch circuit 140.

The latch circuit 140 may have the input terminal, a clock terminal, andan output terminal Q. In an embodiment, the latch circuit 140 may berealized using a flip-flop having a latch function. The input terminalof the latch circuit 140 may be coupled to the output terminal OUT4 ofthe fourth selector 124. The update signal UPDATE may be transmitted tothe clock terminal of the latch circuit 140. The output terminal Q ofthe latch circuit 140 may be coupled to both of the second inputterminal IN32 of the third selector 123 and a second output line 162.The latch circuit 140 may be synchronized with a rising edge of theupdate signal UPDATE to latch the output data of the fourth selector124, which are input to the input terminal of the latch circuit 140. Thelatched data of the latch circuit 140 may be output through the outputterminal Q at a point in time when a certain time elapses from therising edge of the update signal UPDATE. The output data of the latchcircuit 140 may correspond to the feedback data DF[15:0] which aretransmitted to the second input terminal IN32 of the third selector 123.In addition, the output data of the latch circuit 140 may be output asthe operation result data Y[15:0] corresponding to output data of themultiple operation circuit 100 through second output line 162.

FIG. 2 illustrates an example of a configuration of the multiplier 110included in the multiple operation circuit 100 illustrated in FIG. 1. Inthe present embodiment, it may be assumed that the first input dataA[15:0] are comprised of a first sign datum S1[0] having one bit, firstexponent data E1[7:0] having 8 bits, and first mantissa data M1[6:0]having 7 bits. In addition, it may be assumed that the second input dataB[15:0] are comprised of a second sign datum S2[0] having one bit,second exponent data E2[7:0] having 8 bits, and second mantissa dataM2[6:0] having 7 bits. Similarly, it may be assumed that themultiplication result data AB[15:0] output from the multiplier 110 arecomprised of a third sign datum S3[0] having one bit, third exponentdata E3[7:0] having 8 bits, and third mantissa data M3[6:0] having 7bits.

Referring to FIG. 2, the multiplier 110 may include a sign processingcircuit 110S, an exponent processing circuit 110E, a mantissa processingcircuit 110M, and a normalizer 110N. The sign processing circuit 110Smay include an exclusive OR (XOR) gate 111. The XOR gate 111 may performan XOR operation using the first sign datum S1[0] of the first inputdata A[15:0] and the second sign datum S2[0] of the second input dataB[15:0] as input data. The XOR gate 111 may output the third sign datumS3[0] of the multiplication result data.

The exponent processing circuit 110E may include a first exponent adder112 and a second exponent adder 113. The first exponent adder 112 mayperform an adding calculation of the first exponent data E1[7:0] of thefirst input data A[15:0] and the second exponent data E2[7:0] of thesecond input data B[15:0] and may output the result data of the addingcalculation. The second exponent adder 113 may perform an addingcalculation of the output data of the first exponent adder 112 and aminus exponent bias value corresponding to a decimal number of ‘−127’ inorder to subtract the exponent bias value corresponding to a decimalnumber of ‘127’ from the output data of the first exponent adder 112,thereby generating interim exponent data EM[7:0]. The interim exponentdata EM[7:0] output from the second exponent adder 113 may betransmitted to the normalizer 110N.

The mantissa processing circuit 110M may include a mantissa multiplier114. The mantissa multiplier 114 may receive first mantissa data M1[7:0]having 8 bits and second mantissa data M2[7:0] having 8 bits. The firstmantissa data M1[7:0] having 8 bits may be comprised of the firstmantissa data M1[6:0] having 7 bits included in the first input dataA[15:0] and an implied datum IB having one bit. The second mantissa dataM2[7:0] having 8 bits may be comprised of the second mantissa dataM2[6:0] having 7 bits included in the second input data B[15:0] and theimplied datum IB having one bit. The implied datum IB means a binarynumber of “1” that precedes a floating-point. The mantissa multiplier114 may perform a multiplying calculation of the first mantissa dataM1[7:0] having 8 bits and the second mantissa data M2[7:0] having 8 bitsto generate first interim mantissa data MM13[15:0] having 16 bits as aresult of the multiplying calculation. The first interim mantissa dataMM13[15:0] having 16 bits generated by the mantissa multiplier 114 maybe transmitted to the normalizer 110N.

The normalizer 110N may include a floating-point shifter 115, amultiplexer 116, a round processor 117, and a third exponent adder 118.The floating-point shifter 115 of the normalizer 110N may receive thefirst interim mantissa data MM13[15:0] having 16 bits from the mantissamultiplier 114 and may shift a floating-point of the first interimmantissa data MM13[15:0] by one bit toward a most significant bit (MSB)of the first interim mantissa data MM13[15:0] to generate and outputsecond interim mantissa data MM23[15:0]. The floating-point of thesecond interim mantissa data MM23[15:0] may be located between thefifteenth bit MM23[14] and the MSB MM23[15] of the second interimmantissa data MM23[15:0].

The multiplexer 116 of the normalizer 110N may receive the first interimmantissa data MM13[15:0] from the mantissa multiplier 114 through afirst input terminal IN1 of the multiplexer 116. The multiplexer 116 mayalso receive the second interim mantissa data MM23[15:0] from thefloating-point shifter 115 through a second input terminal IN2 of themultiplexer 116. The multiplexer 116 may receive an MSB signal MM13[15]of the first interim mantissa data MM13[15:0] as a selection signal.When the MSB signal MM13[15] of the first interim mantissa dataMM13[15:0] has a binary number of “0”, the multiplexer 116 may outputthe first interim mantissa data MM13[15:0] input through the first inputterminal IN1. In contrast, when the MSB signal MM13[15] of the firstinterim mantissa data MM13[15:0] has a binary number of “1”, themultiplexer 116 may output the second interim mantissa data MM23[15:0]input through the second input terminal IN2.

The round processor 117 of the normalizer 110N may remove 9 bitsincluding the implied bit from the 16-bit interim mantissa data outputfrom the multiplexer 116 and may perform a rounding operation while the9 bits including the implied bit are removed from the 16-bit interimmantissa data. During the rounding operation, an adding calculation foradding a value of “1” may be performed by a round-off operation or around-up operation. The round processor 117 may output the thirdmantissa data M3[6:0] having 7 bits included in the multiplicationresult data AB[15:0].

The third exponent adder 118 of the normalizer 110N may perform anadding calculation for adding an MSB datum MM13[15] of the first interimmantissa data MM13[15:0] output from the mantissa multiplier 114 to theinterim exponent data EM[7:0] output from the second exponent adder 113.The third exponent adder 118 may generate and output the third exponentdata E3[7:0] having 8 bits included in the multiplication result dataAB[15:0]. When the MSB datum MM13[15] of the first interim mantissa dataMM13[15:0] has a binary number of “0”, the third exponent data E3[7:0]output from the third exponent adder 118 may have the same value as theinterim exponent data EM[7:0] output from the second exponent adder 113.When the MSB datum MM13[15] of the first interim mantissa dataMM13[15:0] has a binary number of “1”, the third exponent data E3[7:0]output from the third exponent adder 118 may have a value which is onelarger than the interim exponent data EM[7:0] output from the secondexponent adder 113.

FIG. 3 illustrates an example of a configuration of the adder 130included in the multiple operation circuit 100 illustrated in FIG. 1.Hereinafter, it may be assumed that the adder 130 receives themultiplication result data AB[15:0] and the feedback data DF[15:0] whichare output from respective ones of the second selector 122 and the thirdselector 123. In an embodiment, the multiplication result data AB[15:0]may correspond to the output data of the multiplier 110 described withreference to FIG. 2. In such a case, the multiplication result dataAB[15:0] may be comprised of the third sign datum S3[0] having one bit,the third exponent data E3[7:0] having 8 bits, and the third mantissadata M3[6:0] having 7 bits, as mentioned previously. In the presentembodiment, it may be assumed that the feedback data DF[15:0] arecomprised of a fourth sign datum S4[0] having one bit, fourth exponentdata E4[7:0] having 8 bits, and fourth mantissa data M4[6:0] having 7bits. Similarly, it may be assumed that the MAC data MAC[15:0] arecomprised of a sign datum MAC_S[0] having one bit, exponent dataMAC_E[7:0] having 8 bits, and mantissa data MAC_M[6:0] having 7 bits.

Referring to FIG. 3, the adder 130 may include a difference circuit130D, a 2's complement processing circuit 130C, a shifting circuit 130S,an adding circuit 130A, and a normalizer 130N. The difference circuit130D may receive the third exponent data E3[7:0] of the multiplicationresult data AB[15:0] and the fourth exponent data E4[7:0] of thefeedback data DF[15:0]. The difference circuit 130D may compare thethird exponent data E3[7:0] with the fourth exponent data E4[7:0] tooutput maximum exponent data E_MAX which correspond to the data having alarger value out of the third exponent data E3[7:0] and the fourthexponent data E4[7:0]. The difference circuit 130D may also outputexponent difference data DE corresponding to a difference value betweenthe third exponent data E3[7:0] and the fourth exponent data E4[7:0]. Inaddition, the difference circuit 130D may output a selection signal SELwhich is determined according to target data to be shifted out of thethird exponent data E3[7:0] and the fourth exponent data E4[7:0]. Themaximum exponent data E_MAX output from the difference circuit 130D maybe transmitted to the normalizer 130N. The exponent difference data DEand the selection signal SEL output from the difference circuit 130D maybe transmitted to the shifting circuit 130S.

The 2's complement processing circuit 130C may include a first 2'scomplement processor 131C, a second 2's complement processor 132C, afirst multiplexer 133C, and a second multiplexer 134C. The first 2'scomplement processor 131C may receive the third mantissa data M3[6:0] ofthe multiplication result data AB[15:0]. The first 2's complementprocessor 131C may calculate a 2's complement value of the thirdmantissa data M3[6:0] to generate and output third 2's complement data2M3[6:0]. The second 2's complement processor 132C may receive thefourth mantissa data M4[6:0] of the feedback data DF[15:0]. The second2's complement processor 132C may calculate a 2's complement value ofthe fourth mantissa data M4[6:0] to generate and output fourth 2'scomplement data 2M4[6:0].

The first multiplexer 133C may receive the third mantissa data M3[6:0]of the multiplication result data AB[15:0] through a first inputterminal of the first multiplexer 133C. The first multiplexer 133C mayreceive the third 2's complement data 2M3[6:0] from the first 2'scomplement processor 131C through a second input terminal of the firstmultiplexer 133C. The first multiplexer 133C may receive the third signdatum S3[0] of the multiplication result data AB[15:0] through aselection terminal of the first multiplexer 133C. The first multiplexer133C may output the third mantissa data M3[6:0] input through the firstinput terminal or the third 2's complement data 2M3[6:0] input throughthe second input terminal according to the third sign datum S3[0]. In anembodiment, when the third sign datum S3[0] has a binary number of “0”meaning a positive number, the first multiplexer 133C may output thethird mantissa data M3[6:0]. In contrast, when the third sign datumS3[0] has a binary number of “1” meaning a negative number, the firstmultiplexer 133C may output the third 2's complement data 2M3[6:0].Hereinafter, the output data of the first multiplexer 133C will bereferred to as first interim mantissa data MM1[6:0].

The second multiplexer 134C may receive the fourth mantissa data M4[6:0]of the feedback data DF[15:0] through a first input terminal of thesecond multiplexer 134C. The second multiplexer 134C may receive thefourth 2's complement data 2M4[6:0] from the second 2's complementprocessor 132C through a second input terminal of the second multiplexer134C. The second multiplexer 134C may receive the fourth sign datumS4[0] of the feedback data DF[15:0] through a selection terminal of thesecond multiplexer 134C. The second multiplexer 134C may output thefourth mantissa data M4[6:0] input through the first input terminal orthe fourth 2's complement data 2M4[6:0] input through the second inputterminal according to the fourth sign datum S4[0]. In an embodiment,when the fourth sign datum S4[0] has a binary number of “0” meaning apositive number, the second multiplexer 134C may output the fourthmantissa data M4[6:0]. In contrast, when the fourth sign datum S4[0] hasa binary number of “1” meaning a negative number, the second multiplexer134C may output the fourth 2's complement data 2M4[6:0]. Hereinafter,the output data of the second multiplexer 134C will be referred to assecond interim mantissa data MM2[6:0].

The shifting circuit 130S may include a third multiplexer 131S, a fourthmultiplexer 132S, and a shifter 133S. The third multiplexer 131S mayreceive the first interim mantissa data MM1[6:0] from the firstmultiplexer 133C of the 2's complement processing circuit 130C through afirst input terminal of the third multiplexer 131S. The thirdmultiplexer 131S may receive the second interim mantissa data MM2[6:0]from the second multiplexer 134C of the 2's complement processingcircuit 130C through a second input terminal of the third multiplexer131S. The third multiplexer 131S may receive the selection signal SELfrom the difference circuit 130D through a selection terminal of thethird multiplexer 131S. The third multiplexer 131S may output the firstinterim mantissa data MM1[6:0] or the second interim mantissa dataMM2[6:0] according to the selection signal SEL. In an embodiment, whenthe selection signal SEL has a first logic level (e.g., a logic “low”level), the third multiplexer 131S may output the first interim mantissadata MM1[6:0]. In contrast, when the selection signal SEL has a secondlogic level (e.g., a logic “high” level), the third multiplexer 131S mayoutput the second interim mantissa data MM2[6:0]. Hereinafter, theoutput data of the third multiplexer 131S will be referred to as thirdinterim mantissa data MM3[6:0].

The fourth multiplexer 132S may receive the second interim mantissa dataMM2[6:0] from the second multiplexer 134C of the 2's complementprocessing circuit 130C through a first input terminal of the fourthmultiplexer 132S. The fourth multiplexer 132S may receive the firstinterim mantissa data MM1[6:0] from the first multiplexer 133C of the2's complement processing circuit 130C through a second input terminalof the fourth multiplexer 132S. The fourth multiplexer 132S may receivethe selection signal SEL from the difference circuit 130D through aselection terminal of the fourth multiplexer 132S. The fourthmultiplexer 132S may output the second interim mantissa data MM2[6:0] orthe first interim mantissa data MM1[6:0] according to the selectionsignal SEL. In an embodiment, when the selection signal SEL has a firstlogic level (e.g., a logic “low” level), the fourth multiplexer 132S mayoutput the second interim mantissa data MM2[6:0]. In contrast, when theselection signal SEL has a second logic level (e.g., a logic “high”level), the fourth multiplexer 132S may output the first interimmantissa data MM1[6:0]. Hereinafter, the output data of the fourthmultiplexer 132S will be referred to as fourth interim mantissa dataMM4[6:0].

The shifter 133S may perform a shifting operation for the mantissa bitsof the multiplication result data AB[15:0] or the feedback data DF[15:0]such that the third exponent data E3[7:0] of the multiplication resultdata AB[15:0] input to the adder 130 are consistent with the fourthexponent data E4[7:0] of the feedback data DF[15:0] input to the adder130. Specifically, the shifter 133S may receive the fourth interimmantissa data MM4[6:0] from the fourth multiplexer 132S. The fourthinterim mantissa data MM4[6:0] may be the third 2's complement data2M3[6:0] (or the third mantissa data M3[6:0] of the multiplicationresult data AB[15:0]) or the fourth 2's complement data 2M4[6:0] (or thefourth mantissa data M4[6:0] of the feedback data DF[15:0]). The shifter133S may also receive the exponent difference data DE from thedifference circuit 130D. The shifter 133S may shift the fourth interimmantissa data MM4[6:0] by the number of bits corresponding to theexponent difference data DE to generate shifted mantissa data SM[6:0].

In the present embodiment, the shifter 133S may be configured to shiftthe bits included in the fourth interim mantissa data MM4[6:0] in a leftdirection. However, the present embodiment may be merely an example ofthe present disclosure. Accordingly, in some other embodiments, theshifter 133S may be configured to shift the bits included in the fourthinterim mantissa data MM4[6:0] in a right direction. When the shifter133S is configured to shift the bits included in the fourth interimmantissa data MM4[6:0] in a left direction like the present embodiment,the mantissa data having a relatively smaller value as the fourthinterim mantissa data MM4[6:0] may be transmitted to the shifter 133S.Alternatively, when the shifter 133S is configured to shift the bitsincluded in the fourth interim mantissa data MM4[6:0] in a rightdirection, the mantissa data having a relatively larger value as thefourth interim mantissa data MM4[6:0] may be transmitted to the shifter133S. The mantissa data input to the shifter 133S may be selected by theselection signal SEL which is transmitted from the difference circuit130D to the selection terminals of the third and fourth multiplexers131S and 132S.

The adding circuit 130A may include an integer adder 131A, a third 2'scomplement processor 132A, and a fifth multiplexer 133A. The integeradder 131A may receive the third interim mantissa data MM3[6:0] and theshifted mantissa data SM[6:0] from respective ones of the thirdmultiplexer 131S and the shifter 133S included in the shifting circuit130S. In addition, the integer adder 131A may receive the third signdatum S3[0] and the fourth sign datum S4[0]. The integer adder 131A maygenerate and output the sign datum MAC_S[0] of the MAC data MAC[15:0]according to a result of an adding calculation of the third sign datumS3[0], the fourth sign datum S4[0], the third interim mantissa dataMM3[6:0], and the shifted mantissa data SM[6:0]. Moreover, the integeradder 131A may perform an adding calculation of the third interimmantissa data MM3[6:0] and the shifted mantissa data SM[6:0] to generateand output addition mantissa data AM[6:0]. In an embodiment, when bothof the third sign datum S3[0] and the fourth sign datum S4[0] have abinary number of “O” meaning a positive number, the integer adder 131Amay output a binary number of “0” as the sign datum MAC_S[0] of the MACdata MAC[15:0]. When both of the third sign datum S3[0] and the fourthsign datum S4[0] have a binary number of “1” meaning a negative number,the integer adder 131A may output a binary number of “1” as the signdatum MAC_S[0] of the MAC data MAC[15:0]. When one of the third signdatum S3[0] and the fourth sign datum S4[0] has a binary number of “0”and the other of the third sign datum S3[0] and the fourth sign datumS4[0] has a binary number of “1”, the integer adder 131A may output abinary number of “0” as the sign datum MAC_S[0] if roundup occurs as aresult of the adding calculation of the third interim mantissa dataMM3[6:0] and the shifted mantissa data SM[6:0] and may output a binarynumber of “1” as the sign datum MAC_S[0] if no roundup occurs as aresult of the adding calculation of the third interim mantissa dataMM3[6:0] and the shifted mantissa data SM[6:0]. The integer adder 131Amay output the sign datum MAC_S[0] of the MAC data MAC[15:0] through afirst output terminal and may output the addition mantissa data AM[6:0]through a second output terminal.

The third 2's complement processor 132A may receive the additionmantissa data AM[6:0] output from the integer adder 131A through thesecond output terminal of the integer adder 131A. The third 2'scomplement processor 132A may calculate a 2's complement of the additionmantissa data AM[6:0] to output the 2's complement of the additionmantissa data AM[6:0] as 2's complement addition mantissa data 2AM[6:0].The fifth multiplexer 133A may receive the addition mantissa dataAM[6:0], which are output from the integer adder 131A through the secondoutput terminal of the integer adder 131A, through a first inputterminal of the fifth multiplexer 133A. The fifth multiplexer 133A mayreceive the 2's complement addition mantissa data 2AM[6:0] from thethird 2's complement processor 132A through a second input terminal ofthe fifth multiplexer 133A. The fifth multiplexer 133A may receive thesign datum MAC_S[0] of the MAC data MAC[15:0], which is output from theinteger adder 131A through the first output terminal of the integeradder 131A, through a selection terminal of the fifth multiplexer 133A.The fifth multiplexer 133A may output the addition mantissa data AM[6:0]input through the first input terminal of the fifth multiplexer 133A orthe 2's complement addition mantissa data 2AM[6:0] input through thesecond input terminal of the fifth multiplexer 133A through an outputterminal of the fifth multiplexer 133A according to the sign datumMAC_S[0] of the MAC data MAC[15:0]. In an embodiment, when the signdatum MAC_S[0] has a binary number of “O” meaning a positive number, thefifth multiplexer 133A may output the addition mantissa data AM[6:0]. Incontrast, when the sign datum MAC_S[0] has a binary number of “1”meaning a negative number, the fifth multiplexer 133A may output the 2'scomplement addition mantissa data 2AM[6:0]. Hereinafter, the output dataof the fifth multiplexer 133A will be referred to as fifth interimmantissa data MM5[6:0].

The normalizer 130N may have a first input terminal, a second inputterminal, a first output terminal, and a second output terminal. Thenormalizer 130N may receive the maximum exponent data E_MAX from thedifference circuit 130D through the first input terminal of thenormalizer 130N. The normalizer 130N may receive the fifth interimmantissa data MM5[6:0] from the fifth multiplexer 133A through thesecond input terminal of the normalizer 130N. The normalizer 130N mayoutput the maximum exponent data E_MAX input through the first inputterminal as the exponent data MAC_E[7:0] of the MAC data MAC[15:0]through the first output terminal of the normalizer 130N. In addition,the normalizer 130N may perform a rounding operation for the fifthinterim mantissa data MM5[6:0] input through the second input terminal,thereby generating and outputting the mantissa data MAC_M[6:0] of theMAC data MAC[15:0] through the second output terminal of the normalizer130N.

FIG. 4 illustrates an example of the matrix-vector multiplyingcalculation executed by the MAC operation in the first operation mode ofthe multiple operation circuit 100 illustrated in FIG. 1. Referring toFIG. 4, the multiple operation circuit 100 may execute the matrix-vectormultiplying calculation of a weight matrix and a vector matrix 220 toperform the MAC operation for generating a result matrix 230. The weightmatrix 210 may have “M”-number of rows (i.e., first to M^(th) rowsRW(1), RW(2), . . . , and RW(M)) and “N”-number of columns (i.e., firstto N^(th) columns CW(1), CW(2), . . . , and CW(N)) (where, “M” and “N”are natural numbers which are equal to or greater than two). The vectormatrix 220 may have “N”-number of rows (i.e., first to N^(th) rowsRV(1), RV(2), . . . , and RV(N)) and one column CV(1). The result matrix230 may have “M”-number of rows (i.e., first to M^(th) rows RR(1),RR(2), . . . , and RR(M)) and one column CR(1).

The weight matrix 210 may have “M×N”-number of weight elements, that is,W(1.1)˜W(1.N), . . . , and W(M.1)˜W(M.N). The vector matrix 220 may have“N”-number of vector elements, that is, V(1), V(2), . . . , and V(N).The result matrix 230 may have “M”-number of result elements, that is,MAC_RST(1), MAC_RST(2), . . . , and MAC_RST(M). Hereinafter, a term“weight data” may be construed as having the same meaning as the term“weight element”, and a term “vector data” may be construed as havingthe same meaning as the term “vector element”. In addition, a term “MACresult data” may be construed as having the same meaning as the term“result element”. Hereinafter, it may be assumed that the weight dataand the vector data have a 16-bit floating-point format, for example, a16-bit brain floating-point (BF16) format.

The MAC result data MAC_RST(1) in the first row RR(1) of the resultmatrix 230 may be generated by the matrix-vector multiplying calculationof the weight data W(1.1)˜W(1.N) in the first row RW(1) of the weightmatrix 210 and the vector data V(1)˜V(N) of the vector matrix 220. TheMAC result data MAC_RST(2) in the second row RR(2) of the result matrix230 may be generated by the matrix-vector multiplying calculation of theweight data W(2.1)˜W(2.N) in the second row RW(2) of the weight matrix210 and the vector data V(1)˜V(N) of the vector matrix 220. Similarly,the MAC result data MAC_RST(M) in the M^(th) row RR(M) of the resultmatrix 230 may be generated by the matrix-vector multiplying calculationof the weight data W(M.1)˜W(M.N) in the M^(th) row RW(M) of the weightmatrix 210 and the vector data V(1)˜V(N) of the vector matrix 220.

FIG. 5 illustrates an example of an execution process of thematrix-vector multiplying calculation illustrated in FIG. 4. Themultiple operation circuit 100 may perform the MAC operation for theweight data (e.g., W(1.1)˜W(1.N), W(2.1)˜W(2.N), . . . , orW(M.1)˜W(M.N)) arrayed in any one row among the first to M^(th) rowsRW(1)˜RW(M) of the weight matrix 210 with the vector data of the vectormatrix 220. For example, as illustrated in FIG. 5, the multipleoperation circuit 100 may perform the MAC operation of the weigh dataW(1.1)˜W(1.N) in the first row RW(1) of the weight matrix 210 and thevector data V(1)˜V(N) of the vector matrix 230 to generate the MACresult data MAC_RST(1) in the first row RR(1) of the result matrix 230.

Specifically, the multiple operation circuit 100 may perform a first MACoperation using the weight data W(1.1) located at a cross point of thefirst row RW(1) and the first column CW(1) of the weight matrix 210 andthe vector data V(1) located in the first row RV(1) of the vector matrix220 as input data, thereby generating first MAC data MAC1[15:0]. Next,the multiple operation circuit 100 may perform a second MAC operation ofthe weight data W(1.2) located at a cross point of the first row RW(1)and the second column CW(2) of the weight matrix 210 and the vector dataV(2) located in the second row RV(2) of the vector matrix 220 togenerate second MAC data MAC2[15:0]. The second MAC operation mayinclude an accumulative adding calculation for accumulatively adding aresult of the multiplying calculation of the weight data W(1.2) and thevector data V(2) to the first MAC data MAC1[15:0].

Subsequently, the multiple operation circuit 100 may perform a third MACoperation of the weight data W(1.3) located at a cross point of thefirst row RW(1) and the third column CW(3) of the weight matrix 210 andthe vector data V(3) located in the third row RV(3) of the vector matrix220 to generate third MAC data MAC3[15:0]. The third MAC operation mayinclude an accumulative adding calculation for adding a result of themultiplying calculation of the weight data W(1.3) and the vector dataV(3) to the second MAC data MAC2[15:0]. These MAC operations may becontinuously performed until an N^(th) MAC operation for multiplying theweight data W(1.N) located at a cross point of the first row RW(1) andthe N^(th) column CW(N) of the weight matrix 210 by the vector data V(N)located in the N^(th) row RV(N) of the vector matrix 220 is performed.The N^(th) MAC operation may an accumulative adding calculation foradding a result of the multiplying calculation of the weight data W(1.N)and the vector data V(N) to a result of the (N−1)^(th) MAC operation.N^(th) MAC data MAC″N″ [15:0] generated by the N^(th) MAC operation maycorrespond to the MAC result data MAC_RST(1) in the first row RR(1) ofthe result matrix 230.

FIG. 6 illustrates the first MAC operation of FIG. 5 performed by themultiple operation circuit 100 illustrated in FIG. 1. In FIG. 6, thesame reference numerals or symbols as used in FIG. 1 denote the sameelements. Referring to FIG. 6, in order to perform the first MACoperation of the multiple operation circuit 100, the first selectionsignal SS1 having a logic “high(HI)” level, the second selection signalSS2 having a logic “high(HI)” level, and the third selection signal SS3having a logic “low(LO)” level may be sequentially transmitted to themultiple operation circuit 100. A level of the update signal UPDATE maychange from a logic “low(LO)” level into a logic “high(HI)” level beforea point in time when the second selection signal SS2 is transmitted tothe multiple operation circuit 100 and after a point in time when thethird selection signal SS3 is transmitted to the multiple operationcircuit 100. The weight data W(1.1)[15:0] located at a cross point ofthe first row RW(1) and the first column CW(1) of the weight matrix (210of FIG. 4) may be input to the first input terminal of the multiplier110. The vector data V(1) in the first row RV(1) of the vector matrix(220 of FIG. 4) may be input to the second input terminal of themultiplier 110. As described with reference to FIG. 2, the multiplier110 may perform a multiplying calculation of the weight dataW(1.1)[15:0] and the vector data V(1) to generate and output firstmultiplication result data WV1[15:0] through the output terminal of themultiplier 110.

The first selector 121 receiving the first selection signal SS1 havingthe logic “high(HI)” level may output the first multiplication resultdata WV1[15:0], which are input through the second input terminal IN12of the first selector 121, through the output terminal OUT1. The firstmultiplication result data WV1[15:0] output from the first selector 121may be transmitted to the second input terminal IN22 of the secondselector 122. The first multiplication result data WV1[15:0] output fromthe first selector may also be transmitted to the first input terminalIN41 of the fourth selector 124. When a level of the update signalUPDATE changes from a logic “low(LO)” level into a logic “high(HI)”level, the latch circuit 140 may output its latched data as firstfeedback data DF1[15:0] which are transmitted to the second inputterminal IN32 of the third selector 123. In such a case, because thelatch circuit has an initialized state, the first feedback dataDF1[15:0] may have a value of zero. After the latch circuit 140 outputsthe first feedback data DF1[15:0] having a value of zero, a level of theupdate signal UPDATE may change from a logic “high(HI)” level into alogic “low(LO)” level.

The second selector 122 receiving the second selection signal SS2 havingthe logic “high(HI)” level may output the first multiplication resultdata WV1[15:0], which are transmitted from the first selector 121 to thesecond input terminal IN22 of the second selector 122, through theoutput terminal OUT2. The first multiplication result data WV1[15:0]output from the second selector may be transmitted to the first inputterminal of the adder 130. The third selector 123 receiving the secondselection signal SS2 having the logic “high(HI)” level may output thefirst feedback data DF1[15:0], which are transmitted from the latchcircuit 140 to the second input terminal IN32 of the third selector 123,through the output terminal OUT3 of the third selector 123. The firstfeedback data DF1[15:0] output from the third selector 123 may betransmitted to the second input terminal of the adder 130.

The adder 130 may perform an adding calculation using the firstmultiplication result data WV1[15:0] input to the first input terminaland the first feedback data DF1[15:0] input to the second input terminalas input data, thereby generating and outputting the first MAC dataMAC1[15:0]. Because the first feedback data DF1[15:0] have a value ofzero, the first MAC data MAC1[15:0]output from the adder 130 may havethe same value as the first multiplication result data WV1[15:0]generated by the multiplier 110. The first MAC data MAC1[15:0] outputfrom the adder 130 may be transmitted to the second input terminal IN42of the fourth selector through the output terminal of the adder 130. Inaddition, the first MAC data MAC1[15:0] output from the adder 130 may beoutput from the multiple operation circuit 100 through the first outputline to provide the interim result data IY[15:0]. The inverter 150 maychange a level of the third selection signal SS3 from a logic “low(LO)”level into a logic “high(HI)” level, and the third selection signal SS3having a logic “high(HI)” level may be transmitted to the fourthselector 124. The fourth selector 124 receiving the third selectionsignal SS3 having a logic “high(HI)” level may output the first MAC dataMAC1[15:0], which are transmitted from the adder to the second inputterminal IN42, through the output terminal OUT4. The first MAC dataMAC1[15:0] output from the fourth selector 124 may be transmitted to theinput terminal of the latch circuit 140.

When a level of the update signal UPDATE changes from a logic “low(LO)”level into a logic “high(HI)” level, the latch circuit 140 may latch thefirst MAC data MAC1[15:0] transmitted to the input terminal of the latchcircuit 140. In addition, the latch circuit 140 may output the latcheddata of the first MAC data MAC1[15:0] through the output terminal Q ofthe latch circuit 140. The first MAC data MAC1[15:0] output from thelatch circuit 140 may be transmitted to the second input terminal IN32of the third selector to provide feedback data which are used for thesecond MAC operation to be performed at a next step. The first MAC dataMAC1[15:0] output from the latch circuit 140 may also be output from themultiple operation circuit 100 through the second output line 162. Afterthe latch circuit 140 outputs the first MAC data MAC1[15:0], a level ofthe update signal UPDATE may change from a logic “high(HI)” level into alogic “low(LO)” level.

FIG. 7 illustrates the second MAC operation of FIG. 5 performed by themultiple operation circuit 100 illustrated in FIG. 1. In FIG. 7, thesame reference numerals or symbols as used in FIG. 1 denote the sameelements. Referring to FIG. 7, in order to perform the second MACoperation of the multiple operation circuit 100, the first selectionsignal SS1 having a logic “high(HI)” level, the second selection signalSS2 having a logic “high(HI)” level, and the third selection signal SS3having a logic “low(LO)” level may be sequentially transmitted to themultiple operation circuit 100. A level of the update signal UPDATE maychange from a logic “low(LO)” level into a logic “high(HI)” level beforea point in time when the second selection signal SS2 is transmitted tothe multiple operation circuit 100 and after a point in time when thethird selection signal SS3 is transmitted to the multiple operationcircuit 100. The weight data W(1.2)[15:0] located at a cross point ofthe first row RW(1) and the second column CW(2) of the weight matrix(210 of FIG. 4) may be input to the first input terminal of themultiplier 110. The vector data V(2) in the second row RV(2) of thevector matrix (220 of FIG. 4) may be input to the second input terminalof the multiplier 110. The multiplier 110 may perform a multiplyingcalculation of the weight data W(1.2)[15:0] and the vector data V(2) togenerate and output second multiplication result data WV2[15:0] throughthe output terminal of the multiplier 110.

The first selector 121 receiving the first selection signal SS1 havingthe logic “high(HI)” level may output the second multiplication resultdata WV2[15:0], which are input through the second input terminal IN12of the first selector 121, through the output terminal OUT1. The secondmultiplication result data WV2[15:0] output from the first selector 121may be transmitted to the second input terminal IN22 of the secondselector 122. The second multiplication result data WV2[15:0] outputfrom the first selector 121 may also be transmitted to the first inputterminal IN41 of the fourth selector 124. When a level of the updatesignal UPDATE transmitted to the clock terminal of the latch circuit 140changes from a logic “low(LO)” level into a logic “high(HI)” level, thelatch circuit 140 may output its latched data as second feedback dataDF2[15:0] which are transmitted to the second input terminal IN32 of thethird selector 123. In such a case, the second feedback data DF2[15:0]may correspond to the first MAC data MAC1[15:0] which are latched in thelatch circuit 140 during the first MAC operation described withreference to FIGS. 5 and 6. After the latch circuit outputs the secondfeedback data DF2[15:0], a level of the update signal UPDATE may changefrom a logic “low(LO)” level into a logic “high(HI)” level.

The second selector 122 receiving the second selection signal SS2 havingthe logic “high(HI)” level may output the second multiplication resultdata WV2[15:0], which are transmitted from the first selector 121 to thesecond input terminal IN22 of the second selector 122, through theoutput terminal OUT2. The second multiplication result data WV2[15:0]output from the second selector may be transmitted to the first inputterminal of the adder 130. The third selector 123 receiving the secondselection signal SS2 having the logic “high(HI)” level through theselection terminal S3 may output the second feedback data DF2[15:0](i.e., the first MAC data MAC1[15:0]), which are transmitted from thelatch circuit 140 to the second input terminal IN32 of the thirdselector 123, through the output terminal OUT3 of the third selector123. The first MAC data MAC1[15:0] output from the third selector 123may be transmitted to the second input terminal of the adder 130.

The adder 130 may perform an adding calculation using the secondmultiplication result data WV2[15:0] input to the first input terminaland the first MAC data MAC1[15:0] input to the second input terminal asinput data, thereby generating and outputting the second MAC dataMAC2[15:0]. Accordingly, the second MAC data MAC2[15:0] output from theadder 130 may have a value that the second multiplication result dataare accumulatively added to the first MAC data MAC1[15:0], as describedwith reference to FIG. 5. The second MAC data MAC2[15:0] output from theadder may be transmitted to the second input terminal IN42 of the fourthselector 124 through the output terminal of the adder 130. In addition,the second MAC data MAC2[15:0] output from the adder may be output fromthe multiple operation circuit 100 through the first output line 161 toprovide the interim result data IY[15:0]. The inverter 150 may change alevel of the third selection signal SS3 from a logic “low(LO)” levelinto a logic “high(HI)” level, and the third selection signal SS3 havinga logic “high(HI)” level may be transmitted to the selection terminal S4of the fourth selector 124. The fourth selector 124 receiving the thirdselection signal SS3 having a logic “high(HI)” level may output thesecond MAC data MAC2[15:0], which are transmitted from the adder 130 tothe second input terminal IN42, through the output terminal OUT4. Thesecond MAC data MAC2[15:0] output from the fourth selector 124 may betransmitted to the input terminal of the latch circuit 140.

The latch circuit 140 may be synchronized with a rising edge of theupdate signal UPDATE to latch the second MAC data MAC2[15:0]. Inaddition, the latch circuit 140 may output the latched data of thesecond MAC data MAC2[15:0] through the output terminal Q of the latchcircuit 140. The second MAC data MAC1[15:0] output from the latchcircuit 140 may be transmitted to the second input terminal IN32 of thethird selector 123 to provide feedback data which are used for a thirdMAC operation to be performed at a next step. The second MAC dataMAC2[15:0] output from the latch circuit 140 may also be output from themultiple operation circuit 100 through the second output line 162. Afterthe latch circuit 140 outputs the second MAC data MAC2[15:0], a level ofthe update signal UPDATE may change from a logic “high(HI)” level into alogic “low(LO)” level.

FIG. 8 illustrates an example of the matrix-scalar multiplyingcalculation executed by the EW multiplying calculation in the secondoperation mode of the multiple operation circuit 100 illustrated inFIG. 1. Referring to FIG. 8, the multiple operation circuit 100 mayexecute the matrix-scalar multiplying calculation of a weight matrix 310and a constant C to perform the EW multiplying calculation forgenerating a result matrix 330. In the present embodiment, it may beassumed that the weight matrix 310 is the same as the weigh matrix 210described with reference to FIG. 5. Thus, the weight matrix 310 may have“M×N” sets of weight data W(1.1)˜W(1.N), . . . , and W(M.1)˜W(M.N). Incontrast, the constant C may be comprised of one datum. The resultmatrix 330 may have EWM result data EWM(1.1)˜EWM(1.N), . . . , andEWM(M.1)˜EWM(M.N), which are generated by the EW multiplying calculationof the multiple operation circuit 100, as elements of the result matrix330. Accordingly, the result matrix 330 may have the same size as theweight matrix 310. That is, the result matrix 330 may have “M”-number ofrows (i.e., first to M^(th) rows R(1)˜R(M)) and “N”-number of columns(i.e., first to N^(th) columns C(1)˜C(N)). The EWM result data EWMscorresponding to the elements of the result matrix 330 may be obtainedby multiplying the weight data W(1.1)˜W(1.N), . . . , and W(M.1)˜W(M.N)by the constant C. As such, the EW multiplying calculation executed inthe second operation mode of the multiple operation circuit 100 may beachieved using only the multiplying calculation without using anyaccumulative adding calculation.

FIG. 9 illustrates the EW multiplying calculation of FIG. 8 executed bythe multiple operation circuit 100 illustrated in FIG. 1. In FIG. 9, thesame reference numerals or symbols as used in FIG. 1 denote the sameelements. Referring to FIG. 9, in order to perform the EW multiplyingcalculation in the second operation mode of the multiple operationcircuit 100, the first selection signal SS1 having a logic “high(HI)”level and the third selection signal SS3 having a logic “high(HI)” levelmay be applied to respective ones of the first selector 121 and thefourth selector 124. In such a case, the second selection signal SS2 isinactivated. Thus, the second and third selectors 122 and 123 do notoperate. A level of the update signal UPDATE may change from a logic“low(LO)” level into a logic “high(HI)” level at a point in time when acertain time elapses from a point in time when the third selectionsignal SS3 is applied to the multiple operation circuit 100 (e.g., afterthe output data of the fourth selector 124 are transmitted to the inputterminal of the latch circuit 140). The weight data W(1.1)[15:0] locatedat a cross point of the first row RW(1) and the first column CW(1) ofthe weight matrix (310 of FIG. 8) may be input to the first inputterminal of the multiplier 110. Constant data C[15:0] may be input tothe second input terminal of the multiplier 110. The constant dataC[15:0] may be provided by transforming the constant (C of FIG. 8) intothe same format (e.g., the BF16 format) as the weight data. Themultiplier may perform a multiplying calculation of the weight dataW(1.1)[15:0] and the constant data C[15:0] to generate and output firstmultiplication result data WC1[15:0] through the output terminal of themultiplier 110.

The first selector 121 receiving the first selection signal SS1 havingthe logic “high(HI)” level may output the first multiplication resultdata WC1[15:0], which are input through the second input terminal IN12of the first selector 121, through the output terminal OUT1. The firstmultiplication result data WC1[15:0] output from the first selector 121may be transmitted to the second input terminal IN22 of the secondselector 122 and the first input terminal IN41 of the fourth selector124. Because the second selection signal SS2 is inactivated, the secondand third selectors and 123 do not operate and the adder 130 does notoperate. The inverter 150 may change a level of the third selectionsignal SS3 from a logic “high(HI)” level into a logic “low(LO)” level,and the third selection signal SS3 having a logic “low(LO)” level may betransmitted to the selection terminal S4 of the fourth selector 124. Thefourth selector 124 receiving the third selection signal SS3 having alogic “low(LO)” level may output the first multiplication result dataWC1[15:0], which are transmitted from the output terminal OUT1 of thefirst selector 121 to the first input terminal IN41 of the fourthselector 124, through the output terminal OUT4. The first multiplicationresult data WC1[15:0] output from the fourth selector 124 may betransmitted to the input terminal of the latch circuit 140.

The latch circuit 140 may be synchronized with a rising edge of theupdate signal UPDATE to latch the first multiplication result dataWC1[15:0]. In addition, the latch circuit 140 may output the latcheddata of the first multiplication result data WC1[15:0] through theoutput terminal Q of the latch circuit 140. After the latch circuit 140outputs the first multiplication result data WC1[15:0], a level of theupdate signal UPDATE may change from a logic “high(HI)” level into alogic “low(LO)” level. The first multiplication result data WC1[15:0]output from the latch circuit 140 may be output from the multipleoperation circuit 100 through the second output line 162. The firstmultiplication result data WC1[15:0] output from the multiple operationcircuit 100 may correspond to the EWM result data EWM(1.1) located at across point of the first row R(1) and the first column C(1) of theresult matrix illustrated in FIG. 8.

FIG. 10 illustrates an example of a matrix adding calculation executedby the EW adding calculation in the second operation mode of themultiple operation circuit 100 illustrated in FIG. 1. Referring to FIG.10, the multiple operation circuit 100 may execute a matrix addingcalculation of a first matrix 410 and a second matrix 420 to perform theEW adding calculation for generating a result matrix 430. In the presentembodiment, it may be assumed that the first matrix 410 and the secondmatrix 420 may have the same form as the weight matrix 210 describedwith reference to FIG. 5. Thus, the first matrix 410 may have“M×N”-number of first data A(1.1)˜A(1.N), . . . , and A(M.1)˜A(M.N), andthe second matrix 420 may also have “M×N”-number of second dataB(1.1)˜B(1.N), . . . , and B(M.1)˜B(M.N). The result matrix 430 may haveEWA result data EWA(1.1)˜EWA(1.N), . . . , and EWA(M.1)˜EWA(M.N), whichare generated by the EW adding calculation of the multiple operationcircuit 100, as elements of the result matrix 430. Accordingly, theresult matrix 430 may have the same size as each of the first matrix 410and the second matrix 420. That is, the result matrix 430 may have“M”-number of rows (i.e., first to M^(th) rows R(1)˜R(M)) and “N”-numberof columns (i.e., first to N^(th) columns C(1)˜C(N)). The EWA resultdata EWMs corresponding to the elements of the result matrix 430 may beobtained by adding the first data A(1.1)˜A(1.N), . . . , andA(M.1)˜A(M.N) of the first matrix 410 to respective ones of the seconddata B(1.1)˜B(1.N), . . . , and B(M.1)˜B(M.N) of the second matrix 420.As such, the EW adding calculation executed in the second operation modeof the multiple operation circuit 100 may be achieved using only theadding calculation without using any multiplying calculation and any theaccumulative adding calculation which are executed for the MACoperation.

FIG. 11 illustrates the EW adding calculation of FIG. 10 executed by themultiple operation circuit 100 illustrated in FIG. 1. In FIG. 11, thesame reference numerals or symbols as used in FIG. denote the sameelements. Referring to FIG. 11, in order to execute the EW addingcalculation in the second operation mode of the multiple operationcircuit 100, the second selection signal SS2 having a logic “low(LO)”level may be applied to the selection terminal S2 of the second selector122 and the selection terminal S3 of the third selector 123. Inaddition, the third selection signal SS3 having a logic “low(LO)” levelmay be transmitted to an input terminals of the inverter 150 coupled tothe selection terminal S4 of the fourth selector 124. The firstselection signal SS1 may be inactivated so that the first selector doesnot operate. A level of the update signal UPDATE may change from a logic“low(LO)” level into a logic “high(HI)” level at a point in time when acertain time elapses from a point in time when the third selectionsignal SS3 is applied to the multiple operation circuit 100 (e.g., afterthe output data of the fourth selector 124 are transmitted to the inputterminal of the latch circuit 140).

The first data A(1.1)[15:0] located at a cross point of the first rowR(1) and the first column C(1) of the first matrix (410 of FIG. 10) maybe transmitted to the first input terminal IN21 of the second selector122. The second data B(1.1)[15:0] located at a cross point of the firstrow R(1) and the first column C(1) of the second matrix (420 of FIG. 10)may be transmitted to the first input terminal IN31 of the thirdselector 123. The second selector 122 receiving the second selectionsignal SS2 having a logic “low(LO)” level may output the first dataA(1.1)[15:0], which are input to the first input terminal IN21, throughthe output terminal OUT2. The first data A(1.1)[15:0] output from thesecond selector 122 may be transmitted to the first input terminal ofthe adder 130. The third selector 123 receiving the second selectionsignal SS2 having a logic “low(LO)” level may output the second dataB(1.1)[15:0], which are input to the first input terminal IN31, throughthe output terminal OUT3. The second data B(1.1)[15:0] output from thethird selector may be transmitted to the second input terminal of theadder 130.

The adder 130 may perform an adding calculation of the first dataA(1.1)[15:0] input to the first input terminal and the second dataB(1.1)[15:0] input to the second input terminal, thereby generatingaddition result data DA11[15:0]. The addition result data DA11[15:0]generated by the adder 130 may be transmitted to the second inputterminal IN42 of the fourth selector 124 through the output terminal ofthe adder 130. In addition, the addition result data DA11[15:0]generated by the adder 130 may be output from the multiple operationcircuit 100 through the first output line 161 to provide the interimresult data IY[15:0]. The fourth selector 124 receiving a logic“high(HI)” level output from the inverter 150 may output the additionresult data DA11[15:0], which are transmitted from the adder 130 to thesecond input terminal IN42 of the fourth selector 124, through theoutput terminal OUT4. The addition result data DA11[15:0] output fromthe fourth selector 124 may be transmitted to the input terminal of thelatch circuit 140.

The latch circuit 140 may be synchronized with a rising edge of theupdate signal UPDATE to latch the addition result data DA11[15:0] outputfrom the fourth selector 124. In addition, the latch circuit 140 mayoutput the latched data of the addition result data DA11[15:0] throughthe output terminal Q of the latch circuit 140. After the latch circuit140 outputs the addition result data DA11[15:0], a level of the updatesignal UPDATE may change from a logic “high(HI)” level into a logic“low(LO)” level. The addition result data DA11[15:0] output from thelatch circuit 140 may be output from the multiple operation circuit 100through the second output line 162. The addition result data DA11[15:0]output from the multiple operation circuit 100 may correspond to the EWAresult data EWA(1.1) located at a cross point of the first row R(1) andthe first column C(1) of the result matrix 430 illustrated in FIG. 10.

FIG. 12 illustrates the accumulating calculation executed in the thirdoperation mode of the multiple operation circuit 100 illustrated inFIG. 1. In the third operation mode of the multiple operation circuit100, the accumulating calculation may be performed when certain data arelatched in the latch circuit 140. In an embodiment, in order that thecertain data are latched by the latch circuit 140, the MAC operationdescribed with reference to FIGS. 6 and 7 may be performed in the firstoperation mode of the multiple operation circuit 100 in advance. Thepresent embodiment will be described in conjunction with a case that themultiplying calculation of the weight data W(1.2) located at a crosspoint of the first row RW(1) and the second column CW(2) of the weightmatrix 210 illustrated in FIG. 5 and the vector data V(2) in the secondrow RV(2) of the vector matrix 220 illustrated in FIG. 5 is executed bythe MAC operation performed in the first operation mode of the multipleoperation circuit 100. Thus, the latch circuit 140 of the multipleoperation circuit 100 may have the second multiplication result dataWV2[15:0] which are latched in the latch circuit 140. In addition, itmay be assumed that the first result data IY-1 transmitted to themultiple operation circuit 100 are the first multiplication result dataWV1[15:0] generated by the multiplying calculation of the weight dataW(1.1) located at a cross point of the first row RW(1) and the firstcolumn CW(1) of the weight matrix 210 and the vector data V(1) in thefirst row RV(1) of the vector matrix 220.

Referring to FIG. 12, the multiple operation circuit 100 may receive thefirst multiplication result data WV1[15:0] corresponding to the firstresult data IY-1 while the second multiplication result data WV2[15:0]are latched in the latch circuit 140. The multiple operation circuit 100may also receive the first selection signal SS1 having a logic “low(LO)”level and the second selection signal SS2 having a logic “high(HI)”level. The first selection signal SS1 having a logic “low(LO)” level maybe transmitted to the selection terminal S1 of the first selector 121.The second selection signal SS2 having a logic “high(HI)” level may betransmitted to the selection terminal S2 of the second selector and theselection terminal S3 of the third selector 123. The multiple operationcircuit 100 may receive the update signal UPDATE, a level of whichchanges from a logic “low(LO)” level into a logic “high(HI)” levelbefore the second selection signal SS2 is transmitted to the secondselector 122 and the third selector 123. The update signal UPDATE may betransmitted to the clock terminal of the latch circuit 140. The thirdselection signal SS3 may be inactivated so that the fourth selector 124does not operate.

The first multiplication result data WV1[15:0] corresponding to thefirst result data IY-1 applied to the multiple operation circuit 100 maybe transmitted to the first input terminal IN11 of the first selector121. The first selector 121 may output the first multiplication resultdata WV1[15:0] through the output terminal OUT1 in response to the firstselection signal SS1 having a logic “low(LO)” level. The firstmultiplication result data WV1[15:0] output from the first selector 121may be transmitted to the second input terminal IN22 of the secondselector 122. The latch circuit 140 may be synchronized with a risingedge of the update signal UPDATE to output the second multiplicationresult data WV2[15:0], which are latched in the latch circuit 140,through the output terminal Q. The second multiplication result dataWV2[15:0] output from the latch circuit 140 may be fed back to thesecond input terminal IN32 of the third selector 123. In addition, thesecond multiplication result data WV2[15:0] output from the latchcircuit 140 may be output from the multiple operation circuit 100 toprovide the operation result data Y[15:0].

When the second selection signal SS2 having a logic “high(HI)” level istransmitted to the selection terminal S2 of the second selector 122, thesecond selector 122 may output the first multiplication result dataWV1[15:0], which are input to the second input terminal IN22 of thesecond selector 122, through the output terminal OUT2. The firstmultiplication result data WV1[15:0]output from the second selector 122may be transmitted to the first input terminal of the adder 130. Whenthe second selection signal SS2 having a logic “high(HI)” level istransmitted to the selection terminal S3 of the third selector 123, thethird selector 123 may output the second multiplication result dataWV2[15:0], which are input to the second input terminal IN32 of thethird selector 123, through the output terminal OUT3. The secondmultiplication result data WV2[15:0] output from the third selector 123may be transmitted to the second input terminal of the adder 130.

The adder 130 may perform an adding calculation for adding the firstmultiplication result data WV1[15:0] input to the first input terminalof the adder 130 to the second multiplication result data WV2[15:0]input to the second input terminal of the adder 130, thereby generatingthe second MAC data MAC2[15:0]. The second MAC data MAC2[15:0] generatedby the adder 130 may be output from the multiple operation circuit 100through the first output line to provide the second result dataIY[15:0]. As described with reference to FIG. 5, the second MAC dataMAC2[15:0] output from the multiple operation circuit 100 through thefirst output line 161 may have the same value as the data generated bythe second MAC operation.

FIG. 13 illustrates a configuration of a multiple operation circuit 500according to another embodiment of the present disclosure. Referring toFIG. 13, the multiple operation circuit 500 may include a multiplier510, first to fourth selectors 521˜524, an adder 530, a latch circuit540, an inverter 550, and a normalizer 570. The first to fourthselectors 521˜524, the adder 530, the latch circuit 540, and theinverter 550 of the multiple operation circuit 500 may havesubstantially the same configurations as the first to fourth selectors121˜124, the adder 130, the latch circuit 140, and the inverter 150 ofthe multiple operation circuit 100, which is described with reference toFIG. 1, respectively. Thus, the same descriptions as set forth in theembodiment of FIG. 1 will be omitted hereinafter.

The multiplier 510 may be different from the multiplier described withreference to FIG. 2 in terms of a point that no normalization process isexecuted by the multiplier 510. Specifically, when the first input dataA[15:0] and the second input data B[15:0] having a 16-bit floating-pointformat are input to the multiple operation circuit 500, the multiplier510 may perform a multiplying calculation of the first input dataA[15:0] and the second input data B[15:0]. The multiplier 510 maygenerate and output multiplication result data AB[24:0] as a result ofthe multiplying calculation. Because no normalization process isexecuted by the multiplier 510, the multiplication result data AB[24:0]output from the multiplier 510 may have a 25-bit floating-point format.Thus, all of the multiplication result data AB[24:0], feedback dataDF[24:0]transmitted to the second input terminal IN32 of the thirdselector 523, and MAC data MAC[24:0] output from the adder 530 and thelatch circuit 540 may have a 25-bit floating-point format. The 25-bitMAC data MAC[24:0] may be normalized by the normalizer 570 to have a16-bit floating-point format, and the normalized MAC data MAC[15:0]having a 16-bit floating-point format may be output from the normalizer570 to provide 16-bit result data Y[15:0].

FIG. 14 illustrates an example of a configuration of the multiplier 510included in the multiple operation circuit 500 illustrated in FIG. 13.In the present embodiment, it may be assumed that the both of the firstinput data A[15:0] and the second input data B[15:0] have a 16-bit brainfloating-point (BF16) format. Thus, the first input data A[15:0] may becomprised of a first sign datum S1[0] having one bit, first exponentdata E1[7:0] having 8 bits, and first mantissa data M1[6:0] having 7bits. In addition, the second input data B[15:0] may be comprised of asecond sign datum S2[0] having one bit, second exponent data E2[7:0]having 8 bits, and second mantissa data M2[6:0] having 7 bits. Asdescribed with reference to FIG. 13, the multiplication result dataAB[24:0] output from the multiplier 510 may have a 25-bit floating-pointformat. Hereinafter, it may be assumed that the multiplication resultdata AB[24:0] are comprised of a fifth sign datum S5[0] having one bit,fifth exponent data E5[7:0] having 8 bits, and fifth mantissa dataM5[15:0] having 16 bits.

Referring to FIG. 14, the multiplier 510 may include a sign processingcircuit 510S, an exponent processing circuit 510E, and a mantissaprocessing circuit 510M. The sign processing circuit 510S may include anexclusive OR (XOR) gate 511. The XOR gate 511 may receive the first signdatum S1[0] of the first input data A[15:0] and the second sign datumS2[0] of the second input data B[15:0]. When only one of the first signdatum S1[0] and the second sign datum S2[0] has a binary number of “1”meaning a negative number, the XOR gate 511 may output a binary numberof “1” meaning a negative number. In contrast, when both of the firstsign datum S1[0] and the second sign datum S2[0] have a binary number of“0” meaning a positive number or a binary number of “1” meaning anegative number, the XOR gate 511 may output a binary number of “0”meaning a positive number. The output datum of the XOR gate maycorrespond to the fifth sign datum S5[0] of the 25-bit multiplicationresult data AB[24:0].

The exponent processing circuit 510E may include a first exponent adder512 and a second exponent adder 513. The first exponent adder 512 mayreceive the first exponent data E1[7:0] of the first input data A[15:0]and the second exponent data E2[7:0] of the second input data B[15:0].The first exponent adder 512 may add the first exponent data E1[7:0] tothe second exponent data E2[7:0] to generate and output addition resultdata. The first exponent data E1[7:0] may have a value that an exponentbias value corresponding to a decimal number of “127” is added to theoriginal data of the first exponent data E1[7:0], and the secondexponent data E2[7:0] may also have a value that an exponent bias valuecorresponding to a decimal number of “127” is added to the original dataof the second exponent data E2[7:0]. Thus, in order to obtain anexponent including the exponent bias value, the second exponent adder513 may perform an adding calculation for adding a minus exponent biasvalue corresponding to a decimal number of ‘−127’ to the addition resultdata output from the first exponent adder 512 to subtract a decimalnumber of “127” from the addition result data output from the firstexponent adder 512. Addition result data output from the second exponentadder 513 may correspond to the fifth exponent data E5[7:0] of the25-bit multiplication result data AB[24:0].

The mantissa processing circuit 510M may include a mantissa multiplier514. The mantissa multiplier 514 may receive first mantissa data M1[7:0]of the first input data A[15:0] and second mantissa data M2[7:0] of thesecond input data B[15:0]. The first mantissa data M1[7:0] may beprovided by adding an implied bit IB of “1.” to the first mantissa dataM1[6:0] to have an 8-bit form of “1.M1[6:0]” and may be input to themantissa multiplier 514. Similarly, the second mantissa data M2[7:0] mayalso be provided by adding the implied bit IB of “1.” to the secondmantissa data M2[6:0] to have an 8-bit form of “1.M2[6:0]” and may beinput to the mantissa multiplier 514. The mantissa multiplier 514 mayperform a multiplying calculation of the first mantissa dataM1[7:0]having 8 bits and the second mantissa data M2[7:0] having 8 bits.The mantissa multiplier 514 may output 16-bit data as a result of themultiplying calculation. The 16-bit data output from the mantissamultiplier 514 may correspond to the fifth mantissa data M5[15:0] having16 bits included in the multiplication result data AB[24:0] having a25-bit floating-point format. Because no normalization process isexecuted by the multiplier 510, the floating-point of the fifth mantissadata M5[15:0] included in the multiplication result data AB[24:0] may belocated between the fourteenth bit M5[13] and the fifteenth bit M5[14]of the fifth mantissa data M5[15:0].

FIG. 15 illustrates an example of the normalizer 570 included in themultiple operation circuit 500 illustrated in FIG. 13. Referring to FIG.15, the normalizer 570 may receive the MAC data MAC[24:0] having a25-bit floating-point format and may normalize the MAC data MAC[24:0].In the multiple operation circuit 500, the MAC data MAC[24:0] may havethe same format as the multiplication result data AB[24:0] output fromthe multiplier 510. Thus, as described with reference to FIG. 14, theMAC data MAC[24:0] input to the normalizer 570 may be comprised of asixth sign datum S6[0] having one bit, sixth exponent data E6[7:0]having 8 bits, and sixth mantissa data M6[15:0] having 16 bits. Thenormalizer 570 may normalize the MAC data MAC[24:0] to generate andoutput the result data Y[15:0] having a 16-bit brain floating-point(BF16) format. Thus, the result data Y[15:0] output from the normalizer570 may be comprised of a seventh sign datum S7[0] having one bit,seventh exponent data E7[7:0] having 8 bits, and seventh mantissa dataM7[6:0] having 7 bits. In the normalizer 570, no normalization processis applied to the sixth sign datum S6[0] of the MAC data MAC[24:0].Thus, the sixth sign datum S6[0] may be output from the normalizer 570without any data change to provide the seventh sign datum S7[0] resultdata Y[15:0].

The normalizer 570 may include a floating-point shifter 571, amultiplexer 572, a round processor 573, and an adder 574. Thefloating-point shifter 571 may receive the sixth mantissa data M6[15:0]having 16 bits from the latch circuit (540 of FIG. 13). Thefloating-point shifter 571 may shift a binary floating-point of thesixth mantissa data M6[15:0] by one bit toward a most significant bit(MSB) of the sixth mantissa data M6[15:0] to generate and output sixthmantissa data having the shifted binary floating-point. Specifically,because the MAC data MAC[24:0] have the same format as themultiplication result data AB[24:0] output from the multiplier asdescribed with reference to FIG. 14, the binary floating-point of thesixth mantissa data M6[15:0] may also be located between the fourteenthbit M6[13] and the fifteenth bit M6[14] of the sixth mantissa dataM6[15:0]. Thus, two bits (i.e., the fifteenth bit M6[14] and the MSBM6[15]) including the MSB of the sixth mantissa data M6[15:0] may belocated at a left side of the binary floating-point of the sixthmantissa data M6[15:0]. The floating-point shifter 571 may shift thebinary floating-point of the sixth mantissa data M6[15:0] such that thebinary floating-point of the sixth mantissa data M6[15:0] is locatedbetween the fifteenth bit M6[14] and the MSB M6[15] of the sixthmantissa data M6[15:0]. When the MSB M6[15] of the sixth mantissa dataM6[15:0] has a binary number of “1”, the data generated by thefloating-point shifter 571 may have a form of “1.M6[14:0]” including theimplied bit. However, when the MSB M6[15] of the sixth mantissa dataM6[15:0] has a binary number of “0”, the data generated by thefloating-point shifter 571 may have a form of “0.M6[14:0]” without theimplied bit. The data having the binary floating-point shifted by thefloating-point shifter may be transmitted to a first input terminal IN1of the multiplexer 572.

The multiplexer 572 may receive the data having the binaryfloating-point shifted by the floating-point shifter 571 through thefirst input terminal IN1 of the multiplexer 572. In addition, themultiplexer 572 may receive the sixth mantissa data M6[15:0] of the MACdata MAC[24:0] through a second input terminal IN2 of the multiplexer572. Furthermore, the multiplexer 572 may receive the MSB datum M6[15]of the sixth mantissa data M6[15:0] through a selection terminal of themultiplexer 572. When the MSB M6[15] of the sixth mantissa data M6[15:0]has a binary number of “1” corresponding to a logic “high” level, themultiplexer 572 may output the data (i.e., 16-bit data having a formatof “1.M6[14:0]” including the implied bit) input to the first inputterminal IN1. When the MSB M6[15] of the sixth mantissa data M6[15:0]has a binary number of “0” corresponding to a logic “low” level, themultiplexer 572 may output the sixth mantissa data M6[15:0] input to thesecond input terminal IN2. When the MSB M6[15] of the sixth mantissadata M6[15:0] has a binary number of “0”, the sixth mantissa dataM6[15:0] output from the multiplexer 572 may have a format of“01.M6[13:0]”. In such a case, data having a format of “1.M6[14:0]”including the implied bit may be obtained by removing the MSB M6[15](having a logic “low(0)” level) of the sixth mantissa data M6[15:0] fromthe sixth mantissa data M6[15:0] having a format of “01.M6[13:0]”.

The round processor 573 may receive the 16-bit data from the multiplexer572. The round processor 573 may remove 9 bits including the implied bitfrom the 16-bit output from the multiplexer 572 to generate 7-bit dataand may perform a rounding operation while the 9 bits including theimplied bit are removed from the 16-bit data. During the roundingoperation, an adding calculation for adding a value of “1” may beperformed by a round-off operation or a round-up operation. The roundprocessor 573 may generate and output the seventh mantissa data M7[6:0]having bits included in the result data Y[15:0] as a result of theoperation for adjusting the number of bits and the rounding operation.

The adder 574 may receive the sixth exponent data E6[7:0] having 8 bitsof the MAC data MAC[24:0] and the MSB datum M6[15] of the sixth mantissadata M6[15:0]. The adder 574 may perform an adding calculation of thesixth exponent data E6[7:0] and the MSB datum M6[15] of the sixthmantissa data M6[15:0]. When the MSB datum M6[15] of the sixth mantissadata M6[15:0] has a binary number of “0”, the adder 574 may output thesame data as the sixth exponent data E6[7:0]. When the MSB datum M6[15]of the sixth mantissa data M6[15:0] has a binary number of “1”, theadder 574 may output data which are generated by adding one to the sixthexponent data E6[7:0]. As described above, when the MSB datum M6[15] ofthe sixth mantissa data M6[15:0] has a binary number of “1”, themultiplexer 572 may output the data which are generated by shifting abinary floating-point of the sixth mantissa data M6[15:0] by one bittoward a most significant bit (MSB) of the sixth mantissa data M6[15:0].Thus, in such a case, the exponent change due to the shift of the binaryfloating-point may be compensated by adding one to the sixth exponentdata E6[7:0] input to the adder 574. 8-bit output data of the adder 574may provide the seventh exponent data E7[7:0] having 8 bits included inthe result data Y[15:0].

FIG. 16 illustrates a MAC operator 600 according to an embodiment of thepresent disclosure. Referring to FIG. 16, the MAC operator 600 mayreceive “N”-number of first input data A(1)˜A(N) and “N”-number ofsecond input data B(1)˜B(N). The MAC operator 600 may also receivevarious control signals such as a first selection signal SS1, a secondselection signal SS2, a third selection signal SS3, and an update signalUPDATE. The MAC operator 600 may output N^(th) second result dataIY(N−1) and “N”-number of operation result data (i.e., first to N^(th)operation result data Y(0)˜Y(N−1)).

The MAC operator 600 may include “N”-number of multiple operationcircuits (MOC(0)˜MOC(N−1)) (i.e., first to N^(th) multiple operationcircuits 610(0)˜610(N−1)). Each of the first to N^(th) multipleoperation circuits 610(0)˜610(N−1) constituting the MAC operator 600 mayhave substantially the same configuration as the multiple operationcircuit 100 described with reference to FIG. 1. Thus, the configurationand operation of the multiple operation circuit described with referenceto FIG. 1 may be equally applied to each of the first to N^(th) multipleoperation circuits 610(0)˜610(N−1). Accordingly, the descriptions forthe configuration and operation of each of the first to N^(th) multipleoperation circuits 610(0)˜610(N−1) will be omitted hereinafter to avoidduplicate explanation.

The first to third selection signals SS1, SS2, and SS3 and the updatesignal UPDATE input to the MAC operator 600 may be transmitted to eachof the first to N^(th) multiple operation circuits 610(0)˜610(N−1).Meanwhile, the “N”-number of first input data A(1)˜A(N) may betransmitted to respective ones of the first to N^(th) multiple operationcircuits 610(0)˜610(N−1), and the “N”-number of second input dataB(1)˜B(N) may also be transmitted to respective ones of the first toN^(th) multiple operation circuits 610(0)˜610(N−1). For example, thefirst data A(1) of the first input data A(1)˜A(N) and the first dataB(1) of the second input data B(1)˜B(N) may be transmitted to the firstmultiple operation circuit 610(0), and the second data A(2) of the firstinput data A(1)˜A(N) and the second data B(2) of the second input dataB(1)˜B(N) may be transmitted to the second multiple operation circuit610(1). Similarly, the N^(th) data A(N) of the first input dataA(1)˜A(N) and the N^(th) data B(N) of the second input data B(1)˜B(N)may be transmitted to the N^(th) multiple operation circuit 610(N−1).The first input data A(1)˜A(N) and the second input data B(1)˜B(N) mayhave different data formats according to calculations which are executedusing the first input data A(1)˜A(N) and the second input data B(1)˜B(N)as input data, as described with reference to FIG. 1.

Each of the first to N^(th) multiple operation circuits 610(0)˜610(N−1)may receive the first result data IY-1 to generate and output the secondresult data IY. For example, the first multiple operation circuit 610(0)may receive first data IY-1(0) of the first result data IY-1 to generateand output first data IY(0) of the second result data IY. The firstmultiple operation circuit 610(0) corresponds to a foremost one of thefirst to N^(th) multiple operation circuits 610(0)˜610(N−1). In anembodiment, the first data IY-1(0) of the first result data IY-1, whichare input to the first multiple operation circuit 610(0), may be fixedto have a value of “0”. In another embodiment, the first data IY-1(0) ofthe first result data IY-1, which are input to the first multipleoperation circuit 610(0), may be provided by an external device coupledto the MAC operator 600 whenever the first multiple operation circuit610(0) requests the first data IY-1(0). The second multiple operationcircuit 610(1) may receive second data IY-1(1) of the first result dataIY-1 to generate and output second data IY(1) of the second result dataIY. Similarly, the N^(th) multiple operation circuit 610(N−1) mayreceive N^(th) data IY-1(N−1) of the first result data IY-1 to generateand output N^(th) data IY(N−1) of the second result data IY. The N^(th)data IY(N−1) of the second result data IY output from the N^(th)multiple operation circuit 610(N−1) may be output from the MAC operator600. The N^(th) multiple operation circuit 610(N−1) corresponds to alast one of the first to N^(th) multiple operation circuits610(0)˜610(N−1).

The first to N^(th) operation result data Y(0)˜Y(N−1) output from theMAC operator 600 may be output from the first to N^(th) multipleoperation circuits 610(0)˜610(N−1), respectively. That is, the operationresult data Y generated by the first to N^(th) multiple operationcircuits 610(0)˜610(N−1) may be output from the MAC operator 600. Thefirst multiple operation circuit 610(0) may output the first operationresult data Y(0), and the second multiple operation circuit 610(1) mayoutput the second operation result data Y(1). Similarly, the N^(th)multiple operation circuit 610(N−1) may output the N^(th) operationresult data Y(N−1).

The first to N^(th) multiple operation circuits 610(0)˜610(N−1) may bedisposed in series such that an output line of an (i−1)^(th) multipleoperation circuit is coupled to an input line of an i^(th) multipleoperation circuit (where, “i” is one of the natural numbers from “1” to“N”). Thus, the second result data IY output from the (i−1)^(th)multiple operation circuit may be the first result data IY-1 input tothe i^(th) multiple operation circuit. Specifically, the first dataIY(0) of the second result data IY output through the output line of thefirst operation circuit 610(0) may correspond to the second data IY-1(1)of the first result data IY-1 input to the second operation circuit610(1) through the input line of the second operation circuit 610(1). Inaddition, the second data IY(1) of the second result data IY outputthrough the output line of the second operation circuit 610(1) maycorrespond to the third data IY-1(2) of the first result data IY-1 inputto the third operation circuit (omitted in FIG. 16) through the inputline of the third operation circuit.

The (N−2)^(th) data IY(N−3) of the second result data IY output throughthe output line of the (N−2)^(th) operation circuit (omitted in FIG. 16)may correspond to the (N−1)^(th) data IY-1(N−2) of the first result dataIY-1 input to the (N−1)^(th) operation circuit 610(N−2) through theinput line of the (N−1)^(th) operation circuit 610(N−2). In addition,the (N−1)^(th) data IY(N−2) of the second result data IY output throughthe output line of the (N−1)^(th) operation circuit 610(N−2) maycorrespond to the N^(th) data IY-1(N−1) of the first result data IY-1input to the N^(th) operation circuit 610(N−1) through the input line ofthe N^(th) operation circuit 610(N−1). The N^(th) data IY(N−1) of thesecond result data IY output through the output line of the N^(th)operation circuit 610(N−1) may be output from the MAC operator 600.

The MAC operator 600 may selectively perform the MAC operation in thefirst operation mode, the EW multiplying calculation and the EW addingcalculation in the second operation mode, or the accumulative addingcalculation in the third operation mode. The operation or thecalculation performed by the MAC operator 600 may be selected by thefirst selection signal SS1, the second selection signal SS2, the thirdselection signal SS3, and the update signal UPDATE. When the first andsecond selection signals SS1 and SS2 having a logic “high(HI)” level andthe third selection signal SS3 having a logic “low(LO)” level aretransmitted to the MAC operator 600, the MAC operator 600 may performthe MAC operation in the first operation mode like the multipleoperation circuit 100 described with reference to FIGS. 6 and 7.

When the first and third selection signals SS1 and SS3 having a logic“high(HI)” level are transmitted to the MAC operator and the secondselection signal SS2 is inactivated, the MAC operator 600 may performthe EW multiplying calculation in the second operation mode like themultiple operation circuit 100 described with reference to FIG. 9. Whenthe second and third selection signals SS2 and SS3 having a logic“low(LO)” level are transmitted to the MAC operator 600 and the firstselection signal SS1 is inactivated, the MAC operator 600 may performthe EW adding calculation in the second operation mode like the multipleoperation circuit 100 described with reference to FIG. 11. In addition,when the first selection signal SS1 having a logic “low(LO)” level andthe second selection signal SS2 having a logic “high(HI)” level aretransmitted to the MAC operator 600 and the third selection signal SS3is inactivated, the MAC operator 600 may perform the accumulative addingcalculation in the third operation mode like the multiple operationcircuit 100 described with reference to FIG. 12.

When the MAC operator 600 performs the MAC operation in the firstoperation mode, the MAC operation may be performed in a first MACoperation mode or a second MAC operation mode. The MAC operation in thefirst MAC operation mode or the MAC operation in the second MACoperation mode may be selected according to a way that weight data andvector data are input to the first to N^(th) multiple operation circuits610(0)˜610(N−1). When the MAC operator 600 performs the MAC operation inthe first MAC operation mode, the MAC operator 600 may output the MACresult data MAC_RST located in one of the rows of the result matrix 230illustrated in FIG. 4. In such a case, the N^(th) data IY(N−1) of thesecond result data IY output from the N^(th) multiple operation circuit610(N−1) may correspond to the MAC result data MAC_RST. When the MACoperator 600 performs the MAC operation in the second MAC operationmode, the MAC operator 600 may output plural sets of MAC result data(e.g., the first to M^(th) MAC result data MAC_RST(1)˜MAC_RST(M)) of theresult matrix 230 illustrated in FIG. 4. In such a case, the first toN^(th) operation result data Y(0)˜Y(N−1) output from respective ones ofthe first to N^(th) multiple operation circuit 610(0)˜610(N−1) maycorrespond to the first to M^(th) MAC result data MAC_RST(1)˜MAC_RST(M),respectively. Any one of the first to M^(th) MAC result dataMAC_RST(1)˜MAC_RST(M) may be data which are generated by thematrix-vector multiplying calculation that is performed using the weightdata arrayed in any one row of the weight matrix and the vector data ofthe vector matrix as input data.

FIG. 17 illustrates a MAC operation performed in the first MAC operationmode of the MAC operator 600 illustrated in FIG. 16. In FIG. 17, thesame reference to numerals or symbols as used in FIG. 16 denote the sameelements. The present embodiment will be described in conjunction withthe MAC operation that is performed using the weight data W(1.1)˜W(1.N)in the first row RW(1) of the weight matrix 210 and the vector dataV(1)˜V(N) of the vector matrix 220 as input data to generate the firstMAC result data MAC_RST(1) in the first row RR(1) of the result matrix230, in the matrix-vector multiplying calculation described withreference to FIG. 4. The MAC operation according to the presentembodiment may be equally applied to the MAC operation for generatingeach of the second to M^(th) MAC result data MAC_RST(2)˜MAC_RST(M) ofthe result matrix 230. The first to N^(th) multiple operation circuits610(0)˜610(N−1) may sequentially perform the MAC operation in the firstoperation mode and the accumulative adding calculation in the thirdoperation mode, thereby performing the MAC operation in the first MACoperation mode of the MAC operator 600.

Referring to FIG. 17, the MAC operator 600 may receive the weight dataW(1.1)˜W(1.N) in the first row CR(1) of the weight matrix 210 and thevector data V(1)˜V(N) of the vector matrix 220. In order that the MACoperator 600 performs the MAC operation in the first MAC operation mode,the “N” sets of weight data W(1.1)˜W(1.N) may be transmitted torespective ones of the first to N^(th) multiple operation circuits610(0)˜610(N−1). The “N” sets of vector data V(1)˜V(N) may also betransmitted to the first to Nth multiple operation circuits610(0)˜610(N−1), respectively. For example, the weight data W(1.1)located at a cross point of the first row RW(1) and the first columnCW(1) of the weight matrix 210 and the vector data V(1) in the first rowRV(1) of the vector matrix 220 may be transmitted to the first multipleoperation circuit 610(0). In addition, the weight data W(1.2) located ata cross point of the first row RW(1) and the second column CW(2) of theweight matrix 210 and the vector data V(2) in the second row RV(2) ofthe vector matrix 220 may be transmitted to the second multipleoperation circuit 610(1). Similarly, the weight data W(1.N) located at across point of the first row RW(1) and the N^(th) column CW(N) of theweight matrix 210 and the vector data V(N) in the N^(th) row RV(N) ofthe vector matrix 220 may be transmitted to the N^(th) multipleoperation circuit 610(N−1).

First, the first and second selection signals SS1 and SS2 having a logic“high(HI)” level, the third selection signal SS3 having a logic“low(LO)” level, and the update signal UPDATE for a latch operation maybe transmitted to the MAC operator 600 such that the first to N^(th)multiple operation circuits 610(0)˜610(N−1) of the MAC operator 600perform the MAC operation in the first operation mode. The firstmultiple operation circuit 610(0) may perform a multiplying calculationof the weight data W(1.1) and the vector data V(1) to generate the firstmultiplication result data WV(1), as described with reference to FIG. 6.The first multiple operation circuit 610(0) may latch the firstmultiplication result data WV(1) in the latch circuit (140 of FIG. 6)included in the first multiple operation circuit 610(0).

Substantially the same operation as the MAC operation performed in thefirst operation mode of the first multiple operation circuit 610(0) maybe performed in each of the second to N^(th) multiple operation circuits610(1)˜610(N−1). Accordingly, the second multiple operation circuit610(1) may perform a multiplying calculation of the weight data W(1.2)and the vector data V(2) to generate second multiplication result dataWV(2) and may latch the second multiplication result data WV(2) in thelatch circuit (140 of FIG. 6) included in the second multiple operationcircuit 610(1). In addition, the (N−1)^(th) multiple operation circuit610(N−2) may perform a multiplying calculation of the weight dataW(1.(N−1)) and the vector data V(N−1) to generate (N−1)^(th)multiplication result data WV(N−1) and may latch the (N−1)^(th)multiplication result data WV(N−1) in the latch circuit (140 of FIG. 6)included in the (N−1)^(th) multiple operation circuit 610(N−2).Similarly, the N^(th) multiple operation circuit 610(N−1) may perform amultiplying calculation of the weight data W(1.N) and the vector dataV(N) to generate N^(th) multiplication result data WV(N) and may latchthe N^(th) multiplication result data WV(N) in the latch circuit (140 ofFIG. 6) included in the No multiple operation circuit 610(N−1).

Next, the first selection signal SS1 having a logic “low(LO)” level, thesecond selection signal SS2 having a logic “high(HI)” level, and theupdate signal UPDATE for a latch operation may be transmitted to the MACoperator 600 while the third selection signal SS3 is inactivated. As aresult, the first to N^(th) multiple operation circuits 610(0)˜610(N−1)of the MAC operator 600 may perform the accumulative adding calculationin the third operation mode. The first multiple operation circuit 610(0)may receive the first result data IY-1(0) having a value of zero. Thefirst multiple operation circuit 610(0) may perform an addingcalculation of the first result data IY-1(0) having a value of zero andthe first multiplication result data WV(1) latched in the first multipleoperation circuit 610(0) to generate first MAC data MAC(1) and mayoutput the first MAC data MAC(1) as the first data IY(0) of the secondresult data IY.

The second multiple operation circuit 610(1) may receive the first MACdata MAC(1), which are output from the first multiple operation circuit610(0), as the second data IY-1(1) of the first result data IY-1. Thesecond multiple operation circuit 610(1) may perform an addingcalculation of the first MAC data MAC(1) and the second multiplicationresult data WV(2) latched in the second multiple operation circuit610(1) to generate second MAC data MAC(2) and may output the second MACdata MAC(2) as the second data IY(1) of the second result data IY.

The (N−1)^(th) multiple operation circuit 610(N−2) may receive(N−2)^(th) MAC data MAC(N−2), which are output from the (N−2)th multipleoperation circuit (omitted in FIG. 17), as the (N−1)^(th) data IY-1(N−2)of the first result data IY-1. The (N−1)^(th) multiple operation circuit610(N−2) may perform an adding calculation of the (N−₂)^(th) MAC dataMAC(N−2) and the (N−1)^(th) multiplication result data WV(N−1) latchedin the (N−1)^(th) multiple operation circuit 610(N−2) to generate(N−1)^(th) MAC data MAC(N−1) and may output the (N−1)^(th) MAC dataMAC(N−1) as the (N−1)^(th) data IY(N−2) of the second result data IY.

The N^(th) multiple operation circuit 610(N−1) may receive (N−1)^(th)MAC data MAC(N−1), which are output from the (N−1)^(th) multipleoperation circuit 610(N−2), as the N^(th) data IY-1(N−1) of the firstresult data IY-1. The N^(th) multiple operation circuit 610(N−1) mayperform an adding calculation of the (N−1)^(th) MAC data MAC(N−1) andthe N^(th) multiplication result data WV(N) latched in the N^(th)multiple operation circuit 610(N−1) to generate N to MAC data MAC(N) andmay output the N^(th) MAC data MAC(N) as the N^(th) data IY(N−1) of thesecond result data IY. The N^(th) MAC data MAC(N) corresponding to theN^(th) data IY(N−1) of the second result data IY, which are output fromthe N^(th) multiple operation circuit 610(N−1), may be the first MACresult data MAC_RST(1) which are generated by the matrix-vectormultiplying calculation of the weight data W(1.1)˜W(1.N) in the firstrow RW(1) of the weight matrix 210 and the vector data V(1)˜V(N) of thevector matrix 220, as described with reference to FIGS. 4 and 5.

FIG. 18 illustrates a MAC operation performed in the second MACoperation mode of the MAC operator 600 illustrated in FIG. 16. In FIG.18, the same reference to numerals or symbols as used in FIG. 16 denotethe same elements. The present embodiment will be described inconjunction with the MAC operation that is performed using the weightmatrix 210 and the vector matrix as input data to generate the resultmatrix 230, in the matrix-vector multiplying calculation described withreference to FIG. 4. In such a case, it may be assumed that the MACoperator 600 includes the same number of multiple operation circuits610(0)˜610(M−1) as the rows RWs of the weight matrix 210. That is, thenumber of the multiple operation circuits 610(0)˜610(M−1) may be equalto the number of the rows of the weight matrix 210. Each of the first toM^(th) multiple operation circuits 610(0)˜610(M−1) may iterativelyperform the MAC operation in the first operation mode by the same numberof times as the number of the columns CWs of the weight matrix 210(corresponding to the number of the rows RVs of the vector matrix 220),thereby performing the MAC operation in the second MAC operation mode ofthe MAC operator 600.

Referring to FIG. 18, the MAC operator 600 may sequentially receive theweight data W(1.1)˜W(1.N), . . . , and W(M.1)˜W(M.N) arrayed in all ofthe rows RW(1)˜RW(M) of the weight matrix 210 and the vector dataV(1)˜V(N) of the vector matrix 220. In order that the MAC operator 600performs the MAC operation in the second MAC operation mode, the “N”sets of weight data Ws arrayed in one of the rows of the weight matrix210 may be sequentially transmitted to one of the first to MW multipleoperation circuits 610(0)˜610(M−1) and the “N” sets of vector dataV(1)˜V(N) of the vector matrix 220 may also be sequentially transmittedto the one of the first to M^(th) multiple operation circuits610(0)˜610(M−1). For example, the weight data W(1.1)˜W(1.N) arrayed inthe first row RW(1) of the weight matrix 210 may be sequentiallytransmitted to the first multiple operation circuit 610(0), and thevector data V(1)˜V(N) of the vector matrix 220 may also be sequentiallytransmitted to the first multiple operation circuit 610(0). In addition,the weight data W(2.1)˜W(2.N) arrayed in the second row RW(2) of theweight matrix 210 may be sequentially transmitted to the second multipleoperation circuit 610(1), and the vector data V(1)˜V(N) of the vectormatrix 220 may also be sequentially transmitted to the second multipleoperation circuit 610(1). Similarly, the weight dataW((M−1).1)˜W((M−1).N) arrayed in the (M−1)^(th) row RW(M−1) of theweight matrix 210 may be sequentially transmitted to the (M−1)^(th)multiple operation circuit 610(M−2), and the vector data V(1)˜V(N) ofthe vector matrix 220 may also be sequentially transmitted to the(M−1)th multiple operation circuit 610(M−2). Finally, the weight dataW(M.1)˜W(M.N) arrayed in the M^(th) row RW(M) of the weight matrix maybe sequentially transmitted to the M^(th) multiple operation circuit610(M−1), and the vector data V(1)˜V(N) of the vector matrix may also besequentially transmitted to the M^(th) multiple operation circuit610(M−1).

First, the first and second selection signals SS1 and SS2 having a logic“high(HI)” level, the third selection signal SS3 having a logic“low(LO)” level, and the update signal UPDATE for a latch operation maybe transmitted to the MAC operator 600 such that the first to M^(th)multiple operation circuits 610(0)-610(M−1) of the MAC operator 600perform the MAC operation in the first operation mode. When the weightdata W(1.1), W(2.1), . . . , W((M−1).1), and W(M.1) arrayed in the firstcolumn CW(1) of the weight matrix 210 are transmitted to respective onesof the first to M^(th) multiple operation circuits 610(0)˜610(M−1) andthe vector data V(1) in the first row RV(1) of the vector matrix 220 aretransmitted to each of the first to M^(th) multiple operation circuits610(0)˜610(M−1), each of the first to M^(th) multiple operation circuits610(0)˜610(M−1) may perform the first MAC operation in the firstoperation mode. The first MAC operation performed in the first operationmode may be the same as the first MAC operation described with referenceto FIGS. 5 and 6. Each of the multiple operation circuits610(0)˜610(M−1) may perform the first MAC operation in the firstoperation mode to generate the first MAC data MAC1[15:0] of one of therows of the weigh matrix 210. The “M” sets of the first MAC dataMAC1[15:0] generated by the multiple operation circuits 610(0)˜610(M−1)may be latched in the latch circuits included in respective ones of themultiple operation circuits 610(0)˜610(M−1).

Next, when the weight data W(1.2), W(2.2), . . . ,W((M−1).2), and W(M.2)arrayed in the second column CW(2) of the weight matrix 210 aretransmitted to respective ones of the first to M^(th) multiple operationcircuits 610(0)˜610(M−1) and the vector data V(2) in the second rowRV(2) of the vector matrix 220 are transmitted to each of the first toM^(th) multiple operation circuits 610(0)˜610(M−1), each of the first toM^(th) multiple operation circuits 610(0)˜610(M−1) may perform thesecond MAC operation in the first operation mode. The second MACoperation performed in the first operation mode may be the same as thesecond MAC operation described with reference to FIGS. 5 and 7. Each ofthe multiple operation circuits 610(0)˜610(M−1) may perform the secondMAC operation in the first operation mode to generate the second MACdata MAC2[15:0] of one of the rows of the weigh matrix 210. The “M” setsof the second MAC data MAC2[15:0] generated by the multiple operationcircuits 610(0)˜610(M−1) may be latched in the latch circuits includedin respective ones of the multiple operation circuits 610(0)˜610(M−1).Subsequently, when the weight data W(1.3), W(2.3), . . . , W((M−1).3),and W(M.3) arrayed in the third column CW(3) of the weight matrix aretransmitted to respective ones of the first to M^(th) multiple operationcircuits 610(0)˜610(M−1) and the vector data V(3) in the third row RV(3)of the vector matrix 220 are transmitted to each of the first to M^(th)multiple operation circuits 610(0)˜610(M−1), each of the first to M^(th)multiple operation circuits 610(0)˜610(M−1) may perform the third MACoperation in the first operation mode. The third MAC operation in thefirst operation mode may also be performed in substantially the same wayas described with reference to FIGS. 5 and 7. Each of the multipleoperation circuits 610(0)˜610(M−1) may perform the third MAC operationin the first operation mode to generate the third MAC data MAC3[15:0] ofone of the rows of the weigh matrix 210. The “M” sets of the third MACdata MAC3[15:0] generated by the multiple operation circuits610(0)˜610(M−1) may be latched in the latch circuits included inrespective ones of the multiple operation circuits 610(0)˜610(M−1).

Using the same way as described above, each of the multiple operationcircuits 610(0)˜610(M−1) may sequentially perform fourth to N^(th) MACoperations in the first operation mode to sequentially generate fourthto N^(th) MAC data MAC4[15:0]˜MAC(N)[15:0] of one of the rows of theweigh matrix 210. The “M” sets of N^(th) MAC data MAC(N)[15:0] generatedby respective ones of the first to M^(th) multiple operation circuits610(0)˜610(M−1) may correspond to the first to M^(th) MAC result dataMAC_RST(1)˜ MAC_RST(M) of the result matrix 230 illustrated in FIG. 5,respectively. That is, the first multiple operation circuit 610(0) mayoutput the N^(th) MAC data MAC(N)[15:0], which are generated by thefirst to N^(th) MAC operations for the weight data W(1.1)˜W(1.N) arrayedin the first row of the weight matrix 210 and the vector data V(1)˜V(N)of the vector matrix 220, as the first MAC result data MAC_RST(1). Inaddition, the second multiple operation circuit 610(1) may output theN^(th) MAC data MAC(N)[15:0], which are generated by the first to N^(th)MAC operations for the weight data W(2.1)˜W(2.N) arrayed in the secondrow of the weight matrix 210 and the vector data V(1)˜V(N) of the vectormatrix 220, as the second MAC result data MAC_RST(2). Similarly, the(M−1)th multiple operation circuit 610(M−2) may output the N^(th) MACdata MAC(N)[15:0], which are generated by the first to N^(th) MACoperations for the weight data W((M−1).1)˜W((M−1).N) arrayed in the(M−1)^(th) row of the weight matrix 210 and the vector data V(1)˜V(N) ofthe vector matrix 220, as the (M−1)^(th) MAC result data MAC_RST(M−1).Finally, the M^(th) multiple operation circuit 610(M−1) may output theN^(th) MAC data MAC(N)[15:0], which are generated by the first to N^(th)MAC operations for the weight data W(M.1)˜W(M.N) arrayed in the M^(th)row of the weight matrix 210 and the vector data V(1)˜V(N) of the vectormatrix 220, as the M^(th) MAC result data MAC_RST(M).

FIG. 19 illustrates a PIM device 700 according to an embodiment of thepresent disclosure. Referring to FIG. 19, the PIM device 700 may include“L”-number of memory banks BK(0)˜BK(L−1) (i.e., first to L^(th) memorybanks 710(0)˜710(L−1)), “L”-number of MAC operators MAC(0)˜MAC(L−1)(i.e., first to L^(th) MAC operators 720(0)˜720(L−1)), a global buffer(GB) 730, and a command decoder 740 (where, “L” is a natural numberwhich is equal to or greater than two). In an embodiment, each of thememory banks 710(0)˜710(L−1) may constitute one MAC unit with any one ofthe MAC operators 720(0)˜720(L−1). The MAC operator MAC of a certain MACunit may receive the weigh data from the memory bank of the certain MACunit. For example, the first memory bank 710(0) and the first MACoperator 720(0) may constitute a first MAC unit. In such a case, thefirst MAC operator 720(0) may receive the weight data from the firstmemory bank 710(0). A configuration and an operation of each of the MACoperators 720(0)˜720(L−1) may be the same as the configuration and theoperation of the MAC operator 600 described with reference to FIGS. 16and 17. In such a case, each of the MAC operators 720(0)˜720(L−1)included in the PIM device 700 may perform the same MAC operation in thefirst MAC operation mode as described with reference to FIG. 17.

The global buffer 730 may be configured to transmit the vector data usedfor the MAC operation to the MAC operators 720(0)˜720(L−1). In orderthat the global buffer 730 transmits the vector data to the MACoperators 720(0)˜720(L−1), the global buffer may receive the vector datafrom a controller (not shown) to store the vector data therein inresponse to a request output from a host (not shown). In an embodiment,the global buffer 730 may transmit the vector data to the MAC operators720(0)˜720(L−1) through a global input/output (I/O) line GIO. The vectordata output from the global buffer 730 may be transmitted to each of theMAC operators 720(0)˜720(L−1).

The command decoder 740 may receive a command CMD from an externaldevice, for example, a controller. The command decoder 740 may decodethe command CMD to generate and output control signal such as a firstselection signal SS1, a second selection signal SS2, a third selectionsignal SS3, and an update signal UPDATE. Although not shown in FIG. 19,the command decoder 740 may also output additional control signals suchas a read signal and a write signal for accessing to the memory banks710(0)˜710(L−1) and the global buffer 730. As described with referenceto FIGS. 16 to 18, the first to third selection signals SS1˜SS3 and theupdate signal UPDATE may control a plurality of arithmetic operations orcalculations of the MAC operators 720(0)˜720(L−1).

FIG. 20 illustrates an example of the MAC operation performed by the PIMdevice 700 illustrated in FIG. 19. As described with reference to FIG.19, the MAC operators 720(0)˜720(L−1) of the PIM device 700 may performthe MAC operation in the first MAC operation mode which is describedwith reference to FIG. 17. The present embodiment will be described inconjunction with the MAC operation performed by the first MAC operator720(0) and the first memory bank 710(0). The following description ofthe MAC operation according to the present embodiment may be equallyapplied to the MAC operation of each of the second to L^(th) MACoperators 720(1)˜720(L−1).

Referring to FIG. 20, the first MAC operator 720(0) may include“N”-number of multiple operation circuits (i.e., first to N^(th)multiple operation circuits 610(0)˜610(N−1)). The first MAC operator720(0) may receive the weight data W(1.1)˜W(1.N) from the first memorybank 710(0) to perform the MAC operation. In addition, the first MACoperator 720(0) may receive the vector data V(1)˜V(N) from the globalbuffer (730 of FIG. 19). The “N” sets of weight data W(1.1)˜W(1.N)output from the first memory bank 710(0) may be transmitted to the firstto N^(th) multiple operation circuits 610(0)˜610(N−1) of the first MACoperator 720(0), respectively. The “N” sets of vector data V(1)˜V(N)output from the global buffer (730 of FIG. 19) may also be transmittedto the first to N^(th) multiple operation circuits 610(0)˜610(N−1) ofthe first MAC operator 720(0), respectively. The first MAC operator720(0) may perform the MAC operation in the first MAC operation mode. Asdescribed with reference to FIG. 17, the first MAC operator 720(0) mayperform the MAC operation in the first MAC operation mode to output theN^(th) MAC data MAC(N), which are generated by the N^(th) multipleoperation circuit 610(N−1), as the N^(th) data IY(N−1) of the secondresult data IY. The N^(th) MAC data MAC(N) output from the first MACoperator 720(0) may correspond to the first MAC result data MAC_RST(1)of the result matrix.

FIG. 21 illustrates a PIM device 800 according to another embodiment ofthe present disclosure. Referring to FIG. 21, the PIM device 800 mayinclude “L”-number of memory banks BK(0)˜BK(L−1) (i.e., first to L^(th)memory banks 810(0)˜810(L−1)), “L”-number of multiple operation circuitsMOC(0)˜MOC(L−1) (i.e., first to L^(th) multiple operation circuits820(0)˜820(L−1)), a global buffer (GB) 830, and a command decoder 840(where, “L” is a natural number which is equal to or greater than two).In the PIM device 800, each of the first to L^(th) multiple operationcircuits 820(0)˜820(L−1) may have the same configuration as the multipleoperation circuit 100 described with reference to FIG. 1. Thus, each ofthe first to L^(th) multiple operation circuits 820(0)˜820(L−1) mayselectively perform the MAC operation, the EW multiplying calculation,the EW adding calculation, or the accumulative adding calculation. Wheneach of the first to L^(th) multiple operation circuits 820(0)˜820(L−1)performs the MAC operation, each of the first to L^(th) memory banks810(0)˜810(L−1) may constitute one MAC unit with any one of the first toL^(th) multiple operation circuits 820(0)˜820(L−1). The multipleoperation circuit 820 of a certain MAC unit may receive the weigh dataused for the MAC operation from the memory bank of the certain MAC unit.For example, the first memory bank 810(0) and the first multipleoperation circuit 820(0) may constitute a first MAC unit. In such acase, the first multiple operation circuit 820(0) may receive the weightdata from the first memory bank 810(0). When the PIM device 800 performsthe MAC operation, the first to L^(th) multiple operation circuits820(0)˜820(L−1) may perform the MAC operations in the second MACoperation mode described with reference to FIG. 18.

The global buffer 830 may be configured to transmit the vector data usedfor the MAC operation to the first to L^(th) multiple operation circuits820(0)˜820(L−1). In order that the global buffer transmits the vectordata to the first to L^(th) multiple operation circuits 820(0)˜820(L−1),the global buffer 830 may receive the vector data from a controller (notshown) to store the vector data therein in response to a request outputfrom a host (not shown). In an embodiment, the global buffer 830 maytransmit the vector data to the first to L^(th) multiple operationcircuits 820(0)˜820(L−1) through a global input/output (I/O) line GIO.The vector data output from the global buffer 830 may be transmitted toeach of the first to LC multiple operation circuits 820(0)˜820(L−1).

The command decoder 840 may receive a command CMD from an externaldevice, for example, a controller. The command decoder 840 may decodethe command CMD to generate and output control signal such as a firstselection signal SS1, a second selection signal SS2, a third selectionsignal SS3, and an update signal UPDATE. Although not shown in FIG. 21,the command decoder 840 may also output additional control signals suchas a read signal and a write signal for accessing to the memory banks810(0)˜810(L−1) and the global buffer 830. As described with referenceto FIGS. 16 to 18, the first to third selection signals SS1˜SS3 and theupdate signal UPDATE may control a plurality of arithmetic operations orcalculations of the multiple operation circuits 820(0)˜820(L−1).

FIG. 22 illustrates an example of the MAC operation performed by the PIMdevice 800 illustrated in FIG. 21. As described with reference to FIG.21, the multiple operation circuits 820(0)˜820(L−1) of the PIM device800 may perform the MAC operation in the second MAC operation mode whichis described with reference to FIG. 18. Referring to FIG. 22, the firstmultiple operation circuit 820(0) may sequentially receive the weightdata W(1.1)˜W(1.N) from the first memory bank 810(0) to perform the MACoperation. In addition, the first multiple operation circuit 820(0) maysequentially receive the vector data V(1)˜V(N) from the global buffer(830 of FIG. 21).

Specifically, the first multiple operation circuit 820(0) may receivethe weight data W(1.1) and the vector data V(1) from respective ones ofthe first memory bank 810(0) and the global buffer 830 to perform thefirst MAC operation. Next, the first multiple operation circuit 820(0)may receive the weight data W(1.2) and the vector data V(2) fromrespective ones of the first memory bank 810(0) and the global buffer830 to perform the second MAC operation. Subsequently, the firstmultiple operation circuit 820(0) may receive the weight data W(1.3) andthe vector data V(3) from respective ones of the first memory bank810(0) and the global buffer 830 to perform the third MAC operation. Assuch, the MAC operation may be iteratively performed until the N^(th)MAC operation for the weight data W(1.N) located at a cross point of thefirst row and the N^(th) column of the weight matrix and the vector dataV(N) in the N^(th) row of the vector matrix is performed. After theN^(th) MAC operation of the first multiple operation circuit 820(0) isperformed, the first multiple operation circuit 820(0) may output thefirst MAC data MAC(1) corresponding to a result of the MAC operation forthe weight data W(1.1)˜W(1.N) and the vector data V(1)˜V(N) as the firstMAC result data MAC_RST(1). In the same way, the remaining multipleoperation circuits (i.e., the second to L^(th) multiple operationcircuits 820(1)˜820(L−1)) may also perform the MAC operations togenerate and output the second to L^(th) MAC data MAC(2)˜MAC(L) as thesecond to L^(th) MAC result data MAC_RST(2)˜MAC_RST(L), respectively.

A limited number of possible embodiments for the present teachings havebeen presented above for illustrative purposes. Those of ordinary skillin the art will appreciate that various modifications, additions, andsubstitutions are possible. While this patent document contains manyspecifics, these should not be construed as limitations on the scope ofthe present teachings or of what may be claimed, but rather asdescriptions of features that may be specific to particular embodiments.Certain features that are described in this patent document in thecontext of separate embodiments can also be implemented in combinationin a single embodiment. Conversely, various features that are describedin the context of a single embodiment can also be implemented inmultiple embodiments separately or in any suitable subcombination.Moreover, although features may be described above as acting in certaincombinations and even initially claimed as such, one or more featuresfrom a claimed combination can in some cases be excised from thecombination, and the claimed combination may be directed to asubcombination or variation of a subcombination.

What is claimed is:
 1. A multiplication and accumulation(multiplication/accumulation) (MAC) operator comprising: a plurality ofmultiple operation circuits configured to receive plural sets of firstinput data and plural sets of second input data to generate and outputplural sets of operation result data and result data, wherein the pluralsets of first input data are transmitted to the plurality of multipleoperation circuits, respectively, wherein the plural sets of secondinput data are transmitted to the plurality of multiple operationcircuits, respectively, wherein the plural sets of operation result dataare output from the plurality of multiple operation circuits,respectively, and wherein each of the plurality of multiple operationcircuits is configured to perform an arithmetic operation in a firstoperation mode, a second operation mode, or a third operation modeaccording to first to third selection signals.
 2. The MAC operator ofclaim 1, wherein each of the plurality of multiple operation circuits isconfigured to receive first result data to generate and output secondresult data; and wherein the plurality of multiple operation circuitsare cascaded in series such that the second result data output from an(i−1)^(th) multiple operation circuit of the plurality of multipleoperation circuits correspond to the first result data input to ani^(th) multiple operation circuit of the plurality of multiple operationcircuits, wherein i is a natural number which is equal to or greaterthan two.
 3. The MAC operator of claim 2, wherein the first result datainput to a first multiple operation circuit corresponding to a foremostone of the plurality of multiple operation circuits are fixed to have avalue of zero.
 4. The MAC operator of claim 3, wherein the second resultdata output from a last one of the plurality of multiple operationcircuits correspond to the result data.
 5. The MAC operator of claim 2,wherein the plurality of multiple operation circuits perform a MACoperation corresponding to a matrix-vector multiplying calculation of aweight matrix having first to M^(th) rows and first to N^(th) columnsand a vector matrix having first to N^(th) rows and a single column togenerate a result matrix having first to M^(th) rows and a singlecolumn, wherein “M” and “N” are natural numbers which are equal to orgreater than two; and wherein the MAC operation is performed in a firstMAC operation mode or a second MAC operation mode.
 6. The MAC operatorof claim 5, wherein each of the plurality of multiple operation circuitsperforms the MAC operation in the first MAC operation mode using “N”sets of the weigh data arrayed in the first row of the weight matrix and“N” sets of the vector data arrayed in the single column of the vectormatrix as input data to generate MAC result data located in the firstrow of the result matrix.
 7. The MAC operator of claim 6, wherein the(i−1)^(th) multiple operation circuit is configured to perform amultiplying calculation of one set of the “N” sets of weigh data and oneset of the “N” sets of vector data to latch the result data of themultiplying calculation, is configured to add the latched result data tothe first result data input to the (i−1)^(th) multiple operationcircuits to generate the second result data, and is figured to transmitthe second result data to the i^(th) multiple operation circuit; andwherein the operation of the (i−1)^(th) multiple operation circuit issequentially performed from the first multiple operation circuit to thelast multiple operation circuit.
 8. The MAC operator of claim 7, whereinthe first result data input to the first multiple operation circuit ofthe plurality of multiple operation circuits are fixed to have a valueof zero; and wherein the second result data generated by the lastmultiple operation circuit of the plurality of multiple operationcircuits are output as MAC result data located in one of the first toM^(th) rows of the result matrix.
 9. The MAC operator of claim 5,wherein the plurality of multiple operation circuits perform the MACoperation in the second MAC operation mode using “N” sets of the weighdata arrayed in each of the first to M^(th) rows of the weight matrixand “N” sets of the vector data arrayed in the single column of thevector matrix as input data to generate first to M^(th) MAC result datalocated in respective ones of the first to M^(th) rows of the resultmatrix.
 10. The MAC operator of claim 9, wherein each of the pluralityof multiple operation circuits is configured to iteratively perform theMAC operation for each set of the “N” sets of weight data in one row ofthe weight matrix and each set of the “N” sets of vector data in thesingle column of the vector matrix “N” times.
 11. The MAC operator ofclaim 10, wherein each of the plurality of multiple operation circuitsoutputs the MAC result data, which are generated by the MAC operationperformed using the N^(th) set of weight data and the N^(th) set ofvector data as input data, as the operation result data.
 12. The MACoperator of claim 2, wherein each of the plurality of multiple operationcircuits includes: a multiplier configured to perform a multiplyingcalculation of first input data and second input data to generate andoutput multiplication result data; an adder configured to perform anadding calculation of third input data and fourth input data to generateand output addition result data; a latch circuit configured to latchfifth input data input to an input terminal of the latch circuit togenerate and output feedback data; and a plurality of selectorsconfigured to change transmission paths of the first result data, thefirst input data, the second input data, the multiplication result data,and the addition result data according to the first operation mode, thesecond operation mode, or the third operation mode.
 13. The MAC operatorof claim 12, wherein the plurality of selectors are configured such thatthe multiplication result data are transmitted to become the third inputdata and the feedback data are transmitted to become the fourth inputdata when the MAC operation is performed in the first operation mode.14. The MAC operator of claim 12, wherein the plurality of selectors areconfigured such that the multiplication result data are transmitted tothe input terminal of the latch circuit when an element-wise (EW)multiplying calculation is performed in the second operation mode. 15.The MAC operator of claim 12, wherein the plurality of selectors areconfigured such that the first input data are transmitted to become thethird input data and the second input data are transmitted to become thefourth input data when an element-wise (EW) adding calculation isperformed in the second operation mode.
 16. The MAC operator of claim12, wherein the plurality of selectors are configured such that thefirst result data are transmitted to become the third input data and thefeedback data are transmitted to become the fourth input data when anaccumulative adding calculation is performed in the third operationmode.
 17. The MAC operator of claim 12, wherein each of the plurality ofmultiple operation circuits receives the first to third selectionsignals; and wherein the plurality of selectors include: a firstselector configured to receive the first result data and themultiplication result data to output one of the first result data andthe multiplication result data in response to the first selectionsignal; a second selector configured to receive the first input data andthe output data of the first selector to output one of the first inputdata and the output data of the first selector in response to the secondselection signal; a third selector configured to receive the secondinput data and the feedback data to output one of the second input dataand the feedback data in response to the second selection signal; and afourth selector configured to receive the output data of the firstselector and the addition result data to output one of the output dataof the first selector and the addition result data in response to thethird selection signal.
 18. The MAC operator of claim 17, wherein eachof the plurality of multiple operation circuits further includes aninverter that inverts a level of the third selection signal to transmitthe inverted signal of the third selection signal to the fourthselector.
 19. The MAC operator of claim 18, wherein the first selectoris configured to output the first result data when the first selectionsignal has a first logic level and is configured to output themultiplication result data when the first selection signal has a secondlogic level; wherein the second selector is configured to output thefirst input data when the second selection signal has the first logiclevel and is configured to output the output data of the first selectorwhen the second selection signal has the second logic level; wherein thethird selector is configured to output the second input data when thesecond selection signal has the first logic level and is configured tooutput the feedback data when the second selection signal has the secondlogic level; and wherein the fourth selector is configured to output theoutput data of the first selector when the third selection signal hasthe second logic level and is configured to output the addition resultdata when the third selection signal has the first logic level.
 20. TheMAC operator of claim 19, wherein the output data of the second selectorcorrespond to the third input data; wherein the output data of the thirdselector corresponds to the fourth input data; and wherein the outputdata of the fourth selector correspond to the fifth input data.
 21. TheMAC operator of claim 20, wherein the latch circuit included in each ofthe plurality of multiple operation circuits receives an update signalthrough a clock terminal; and wherein the latch circuit is synchronizedwith logic level transition of the update signal to latch the fifthinput data input to the input terminal of the latch circuit and tooutput the latched data of the fifth input data as the feedback datathrough an output terminal of the latch circuit.
 22. The MAC operator ofclaim 21, wherein each of the plurality of multiple operation circuitsfurther includes: a first output line through which the addition resultdata, which are output from the adder, are transmitted to provide thesecond result data; and a second output line through which the latcheddata of the fifth input data, which are output from the latch circuit,are transmitted to provide the operation result data.
 23. The MACoperator of claim 22, wherein in the first operation mode, the MACoperation is performed by a matrix-vector multiplying calculation ofweight data and vector data; wherein in the second operation mode, anelement-wise (EW) multiplying calculation is performed by amatrix-scalar multiplying calculation of the weight data and a constant;wherein in the second operation mode, an element-wise (EW) addingcalculation is performed by a matrix adding calculation of a firstmatrix and a second matrix; and wherein in the third operation mode, anaccumulating calculation is performed by adding the first result data tothe latched data.
 24. The MAC operator of claim 23, wherein each of theplurality of multiple operation circuits is configured to perform theMAC operation in the first operation mode in response to the first andsecond selection signals having the second logic level and the thirdselection signal having the first logic level.
 25. The MAC operator ofclaim 23, wherein each of the plurality of multiple operation circuitsis configured to perform the EW multiplying calculation in the secondoperation mode in response to the first and third selection signalshaving the second logic level while the second selection signal isinactivated.
 26. The MAC operator of claim 23, wherein each of theplurality of multiple operation circuits is configured to perform the EWadding calculation in the second operation mode in response to thesecond and third selection signals having the first logic level whilethe first selection signal is inactivated.
 27. The MAC operator of claim23, wherein each of the plurality of multiple operation circuits isconfigured to perform the accumulating calculation in the thirdoperation mode in response to the first selection signal having thefirst logic level and the second selection signal having the secondlogic level while the third selection signal is inactivated.
 28. The MACoperator of claim 17, wherein the first selector has a first inputterminal receiving the first result data, a second input terminalcoupled to an output terminal of the multiplier, a selection terminalreceiving the first selection signal, and an output terminal; whereinthe second selector has a first input terminal receiving the first inputdata, a second input terminal coupled to the output terminal of thefirst selector, a selection terminal receiving the second selectionsignal, and an output terminal coupled to a first input terminal of theadder; wherein the third selector has a first input terminal receivingthe second input data, a second input terminal coupled to an outputterminal of the latch circuit, a selection terminal receiving the secondselection signal, and an output terminal coupled to a second inputterminal of the adder; and wherein the fourth selector has a first inputterminal coupled to the output terminal of the first selector, a secondinput terminal coupled to an output terminal of the adder, a selectionterminal receiving the third selection signal, and an output terminalcoupled to an input terminal of the latch circuit.
 29. The MAC operatorof claim 12, wherein each set of the plural sets of first input data hasa floating-point format including a first sign datum, first exponentdata, and first mantissa data; wherein each set of the plural sets ofsecond input data has a floating-point format including a second signdatum, second exponent data, and second mantissa data; and wherein themultiplier includes: a sign processing circuit configured to perform alogical exclusive OR operation of the first sign datum the second signdatum to generate and output a third sign datum of the multiplicationresult data; an exponent processing circuit configured to add the firstexponent data to the second exponent data to generate exponent additionresult data and configured to subtract a bias value from the exponentaddition result data; a mantissa processing circuit configured toperform an adding calculation of the first mantissa data and the secondmantissa data; and a normalizer configured to convert exponent dataoutput from the exponent processing circuit into normalized thirdexponent data of the multiplication result data and configured toconvert mantissa data output from the mantissa processing circuit intonormalized third mantissa data of the multiplication result data. 30.The MAC operator of claim 29, wherein the third input data have afloating-point format including a third sign datum, third exponent data,and third mantissa data; wherein the fourth input data have afloating-point format including a fourth sign datum, fourth exponentdata, and fourth mantissa data; and wherein the adder includes: adifference circuit configured to compare the third exponent data and thefourth exponent data to generate and output maximum exponent data,exponent difference data, and a selection signal; a 2's complementprocessing circuit configured to output the third mantissa data or 2'scomplement data of the third mantissa data as first interim mantissadata according to a value of the third sign datum and configured tooutput the fourth mantissa data or 2's complement data of the fourthmantissa data as second interim mantissa data according to a value ofthe fourth sign datum; a shifting circuit configured to perform ashifting operation for the first interim mantissa data or the secondinterim mantissa data according to a logic level of the selection signaland to output mantissa data not to be shifted as third interim mantissadata and mantissa data to be shifted as shifted interim mantissa data;an adding circuit configured to add the non-shifted third interimmantissa data to the shifted interim mantissa data to generate a signdatum of the addition result data and addition mantissa data, configuredto output the sign datum of the addition result data through a firstoutput terminal of the adding circuit, and configured to output theaddition mantissa data or 2's complement data of the addition mantissadata as third interim mantissa data through a second output terminal ofthe adding circuit according to a value of the sign datum of theaddition result data; and a normalizer configured to normalize themaximum exponent data and the third interim mantissa data to generateexponent data and mantissa data of the addition result data.
 31. The MACoperator of claim 12, wherein each set of the plural sets of first inputdata has a floating-point format including a first sign datum, firstexponent data, and first mantissa data; wherein each of the plural setsof second input data has a floating-point format including a second signdatum, second exponent data, and second mantissa data; wherein thenumber of bits included in the multiplication result data output fromthe multiplier is twice the number of bits included in the first inputdata or the second input data; wherein the number of bits included inthe addition result data output from the adder is equal to the number ofbits included in the multiplication result data; and wherein the numberof bits included in the feedback data output from the latch circuit isequal to the number of bits included in the addition result data. 32.The MAC operator of claim 31, wherein the multiplier includes a mantissamultiplier that performs a multiplying calculation of first dataincluding the first mantissa data of the first input data and an impliedbit and second data including the second mantissa data of the secondinput data and the implied bit to generate and output the result data ofthe multiplying calculation; and wherein the result data output from themantissa multiplier provide mantissa data of the multiplication resultdata output from the multiplier.
 33. The MAC operator of claim 32,wherein each of the plurality of multiple operation circuits furtherincludes a normalizer coupled to an output terminal of the latchcircuit; and wherein the normalizer is configured to normalize thefeedback data output from the latch circuit to generate data having thesame number of bits as the first input data or the second input data.34. The MAC operator of claim 33, wherein the normalizer includes: afloating-point shifter configured to receive mantissa data of outputdata of the latch circuit to shift a binary floating-point of themantissa data by one bit toward a most significant bit (MSB) of themantissa data to generate and output mantissa data having the shiftedbinary floating-point; a multiplexer configured to selectively outputthe mantissa data of the output data of the latch circuit or output dataof the floating-point shifter according to a logic level of an MSB ofthe mantissa data of the output data of the latch circuit; a roundprocessor configured to remove certain bits including the implied bitfrom output data of the multiplexer and configured to perform a roundingoperation to generate and output mantissa data having the same number ofbits as the first input data or the second input data; and an adderconfigured to add the MSB datum of the mantissa data of the output dataof the latch circuit to exponent data of the output data of the latchcircuit to generate and output exponent data having the same number ofbits as the first input data or the second input data.